Thursday, June 19, 2025

Recent partial openQA outage (now fixed, apologies)

Hey folks! You may have noticed a substantial amount of incorrect test
failures in openQA in the last 20 hours or so. I've now rebooted the
broken system and rescheduled all the failed tests, but it will take a
few hours for them to work through the system. A few may flake, too - I
will catch these and restart them during the day.

Details for anyone interested: since updating to the latest openQA
upstream code and Fedora 42, it seems like the worker hosts that run
multi-worker networked jobs sometimes hit some kind of issue between
openvswitch and dbus-broker, which causes all such jobs run after that
point to fail at startup with "could not configure /dev/net/tun
(tap19): Device or resource busy" errors. Other jobs also sometimes
fail, I think because the broken tap jobs block VNC ports or something.

Another openQA user saw something similar and we're tracking it
upstream at https://progress.opensuse.org/issues/183833 , but we
haven't entirely got to the root cause yet. For now the only thing I
can do is reboot the worker hosts when they get in this state. I didn't
check openQA before going to bed last night so I let this drag out
longer than necessary - sorry about that!
--
Adam Williamson (he/him/his)
Fedora QA
Fedora Chat: @adamwill:fedora.im | Mastodon: @adamw@fosstodon.org
https://www.happyassassin.net



--
_______________________________________________
devel-announce mailing list -- devel-announce@lists.fedoraproject.org
To unsubscribe send an email to devel-announce-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel-announce@lists.fedoraproject.org
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue

No comments:

Post a Comment