2017-10-14T16:57:31 *** heroes-bot has joined #opensuse-admin 2017-10-14T17:06:12 *** dddh has quit IRC 2017-10-14T17:12:48 PROBLEM: NRPE on provo-mirror.opensuse.org - CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=provo-mirror.opensuse.org&service=NRPE 2017-10-14T17:17:40 PROBLEM: rsync on pontifex3.infra.opensuse.org - rsync: failed to connect to stage.opensuse.org (130.57.72.10): Connection refused (111) ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=pontifex3.infra.opensuse.org&service=rsync 2017-10-14T17:33:34 tampakrap: what was simba set up for? 2017-10-14T17:34:15 darix: gitlab-ci-runner 2017-10-14T17:35:44 tampakrap: can you please provide some documentation about that machine? 2017-10-14T17:36:01 tampakrap: I did not find anything about the host in progress nor in in racktables ... 2017-10-14T17:36:29 sure 2017-10-14T17:36:30 * cboltz would prefer that documentation in a *.sls file ;-) 2017-10-14T17:37:43 RECOVERY: rsync on pontifex3.infra.opensuse.org - OK: Rsync is up ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=pontifex3.infra.opensuse.org&service=rsync 2017-10-14T17:43:16 the monitoring looks quite green given that everything was rebooted :-) 2017-10-14T17:43:35 the most serious issue it lists is that monitor.infra.o.o can't monitor itsself ;-) 2017-10-14T17:44:46 oh, and it seems we miss a check for the gitlab web interface - it's still down, but there's no warning in the monitoring 2017-10-14T17:47:11 yes I think we have a ticket about it already 2017-10-14T17:49:00 speaking of tickets - can you please block duonimhan@yandex.com in progress (repeated spammer) 2017-10-14T17:49:33 sure 2017-10-14T17:52:11 *** Ada_Lovelace has joined #opensuse-admin 2017-10-14T17:53:33 for the missing monitoring - we have tickets for progress, osc-collab, dale and conference - but not for gitlab 2017-10-14T17:55:47 cboltz: gitlab is up, but it doesn't have https 2017-10-14T17:55:48 we disabled it two days ago that we were trying to clone the repos from nuremberg to provo 2017-10-14T17:56:56 cboltz: did you get my messages? 2017-10-14T17:58:43 RECOVERY: Updates on status2.opensuse.org - Updates OK : no updates available ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=status2.opensuse.org&service=Updates 2017-10-14T17:59:10 indeed, http works - but all redirects it does (hint: login, and after login) go to https, so I have to change the URL manually each time 2017-10-14T18:00:33 please review and merge http://gitlab.infra.opensuse.org/infra/salt/merge_requests/67 2017-10-14T18:02:01 cboltz: maybe wait until all services are moved back and actually tested that they are working? 2017-10-14T18:02:04 just a thought 2017-10-14T18:02:40 no worries, accepting it in gitlab doesn't mean I'll run highstate instantly ;-) 2017-10-14T18:03:45 yeah 2017-10-14T18:03:45 I'll enable it back 2017-10-14T18:06:55 tampakrap: systemctl enable logrotate.timer 2017-10-14T18:06:58 via salt 2017-10-14T18:07:00 pretty please 2017-10-14T18:07:13 *fixing anna + elsa atm* 2017-10-14T18:07:44 just wondering - does 'service.running' also work for timers? If yes, please also add that to make it permanent 2017-10-14T18:08:15 (only for 42.3 - older versions use cron.daily, and it's a known bug that enabling logrotate.timer on upgrade is broken) 2017-10-14T18:08:24 darix: for sle12 and leap or only for leap? 2017-10-14T18:11:16 all 2017-10-14T18:11:59 okay 2017-10-14T18:13:28 just wondering - why is paste.o.o still marked as down? My testing shows that it works 2017-10-14T18:13:43 paste.o.o never was down 2017-10-14T18:13:51 as is not hosted in nbg 2017-10-14T18:14:29 ok, so I'll mark it as working 2017-10-14T18:15:36 *** kl_eisbaer has quit IRC 2017-10-14T18:16:37 software.o.o is also back and working (including the package search) :-) 2017-10-14T18:17:19 so the only remaining service is OBS - the website works, but I guess it needs some time to initialize the schedulers? 2017-10-14T18:19:09 *** Ada_Lovelace has quit IRC 2017-10-14T18:21:36 maybe 2017-10-14T18:23:56 the OBS status page shows that most workers are up, and the first ~200 are already building packages 2017-10-14T18:29:28 cboltz: would you like to review? http://gitlab.infra.opensuse.org/infra/salt/merge_requests/68 2017-10-14T18:29:33 *** kl_eisbaer has joined #opensuse-admin 2017-10-14T18:31:31 I'm not sure if osmajorrelease is the correct selector - logrotate.timer only exists since 42.3 (and probably SLE12 SP3, but that's just my guess) 2017-10-14T18:33:03 so if we have any machine that still runs 42.[12] or SLE12 SP[0-2], you'll see an error 2017-10-14T18:34:09 any machine that isnt sp3 or 42.3 yet has other problems 2017-10-14T18:34:14 like: it should be updated 2017-10-14T18:34:18 i saw 42.1 machines! 2017-10-14T18:35:29 poke their admins to upgrade them ;-) 2017-10-14T18:36:04 and just in case you are the admin of one of those machines, find someone to poke you ;-) 2017-10-14T18:40:03 cboltz: i am not doing opensuse stuff anymore 2017-10-14T18:40:46 I know, but if you see a 42.1 machine, it shouldn't be too hard to poke its admin ;-) 2017-10-14T18:41:40 BTW: does this also mean you don't handle "create OBS repo" requests anymore? 2017-10-14T18:46:27 cboltz: those i still do. but i was a tad busy 2017-10-14T18:47:47 yeah, I guessed so - there's a ticket queue waiting for you ;-) (some of them assigned directly to you, some of them assigned to opensuse-admin-obs) 2017-10-14T18:48:10 i saw 2017-10-14T18:48:23 hopefully when we are done with the post mortem of this fun i will have time 2017-10-14T18:49:09 I hope the post mortem will be short as in "power is back, everything works again" ;-) 2017-10-14T18:49:30 at least clicking through all services listed on status.o.o didn't show any problem 2017-10-14T18:49:53 cboltz: you can still learn things to improve the procedure next time 2017-10-14T18:50:16 yes, of course 2017-10-14T18:56:40 cboltz, maybe update the status page past incidents? 2017-10-14T18:58:41 a few OBS workers are still down, but everything else is up and running again 2017-10-14T18:59:21 so if nobody objects in the next 5 minutes, I'll update status.o.o to say "power is back, and all services are up and running again" 2017-10-14T19:00:32 cboltz, arm workers won't come back until next week according to the arm ML post 2017-10-14T19:01:05 cboltz, when you do that update, I will remove the forum notice ;) 2017-10-14T19:02:13 cboltz, https://lists.opensuse.org/opensuse-arm/2017-10/msg00031.html 2017-10-14T19:03:21 cboltz: object given 2017-10-14T19:03:54 cboltz: we are still in the "power up mode". If something breaks in between, your "everything green" might turn red in seconds 2017-10-14T19:04:26 cboltz: so please simply be happy that everything looks good so far and wait for the go from those people who are on the machines ... ;-) 2017-10-14T19:05:38 ok 2017-10-14T19:44:05 cboltz: other than missing build archs for arm and s390, everything else looks green 2017-10-14T19:44:32 :-) 2017-10-14T19:47:52 https://build.opensuse.org/monitor also shows that some x86_64 workers are still down - but that's "just" reduced OBS performance 2017-10-14T19:49:01 so - can/should we remove the downtime notice from the wiki and forum now? 2017-10-14T19:49:09 * cboltz votes "yes" 2017-10-14T19:49:52 cboltz: fine with me 2017-10-14T19:50:12 then I need someone to merge http://gitlab.infra.opensuse.org/infra/salt/merge_requests/67 ;-) 2017-10-14T19:51:32 merged 2017-10-14T19:51:51 I'll need to adjust the master 2017-10-14T19:52:35 cboltz, done on the Forum 2017-10-14T19:54:01 cboltz: give it a try, it should be fine now 2017-10-14T19:55:00 yes, the downtime notice is gone in the wikis :-) 2017-10-14T20:03:02 PROBLEM: Updates on mirrordb4.infra.opensuse.org - Updates CRITICAL : 1 security update(s): 1 recommended update(s): 4 package update(s): ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=mirrordb4.infra.opensuse.org&service=Updates 2017-10-14T20:03:02 *** kl_eisbaer has left #opensuse-admin 2017-10-14T20:03:03 PROBLEM: Updates on mirrordb3.infra.opensuse.org - Updates CRITICAL : 1 security update(s): 13 package update(s): ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=mirrordb3.infra.opensuse.org&service=Updates 2017-10-14T20:03:04 PROBLEM: Updates on narwal.infra.opensuse.org - Updates CRITICAL : 1 security update(s): 3 package update(s): ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=narwal.infra.opensuse.org&service=Updates 2017-10-14T20:03:05 PROBLEM: Updates on community.infra.opensuse.org - Updates CRITICAL : 1 security update(s): 3 package update(s): ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=community.infra.opensuse.org&service=Updates 2017-10-14T20:03:06 PROBLEM: Updates on icc.infra.opensuse.org - Updates CRITICAL : 1 security update(s): 3 package update(s): ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=icc.infra.opensuse.org&service=Updates 2017-10-14T20:03:07 PROBLEM: Updates on narwal2.infra.opensuse.org - Updates CRITICAL : 1 security update(s): 3 package update(s): ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=narwal2.infra.opensuse.org&service=Updates 2017-10-14T20:03:08 PROBLEM: Updates on boosters.infra.opensuse.org - Updates CRITICAL : 1 security update(s): 3 package update(s): ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=boosters.infra.opensuse.org&service=Updates 2017-10-14T20:03:09 PROBLEM: Updates on pontifex3.infra.opensuse.org - Updates CRITICAL : 1 security update(s): 8 package update(s): ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=pontifex3.infra.opensuse.org&service=Updates 2017-10-14T20:03:10 PROBLEM: Updates on freeipa.infra.opensuse.org - CHECK_UPDATES CRITICAL - 10 non-critical updates available ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=freeipa.infra.opensuse.org&service=Updates 2017-10-14T20:14:44 *** dddh has joined #opensuse-admin 2017-10-14T20:14:44 *** dddh has joined #opensuse-admin 2017-10-14T20:34:00 tampakrap, was packagehub.suse.com part of the outage? 2017-10-14T20:35:26 malcolmlewis: yes 2017-10-14T20:35:52 tampakrap, that hasn't come back yet then.... 2017-10-14T20:37:05 sure, people are still working on bringing services back 2017-10-14T20:39:07 tampakrap, ok, thanks :) 2017-10-14T20:43:56 tampakrap: please open a ticket about it 2017-10-14T20:44:02 lars and me just headed home 2017-10-14T20:44:39 okay 2017-10-14T21:12:05 Team, I would just like to say - THANK YOU for doing such an awesome job 2017-10-14T21:41:51 *** Son_Goku has quit IRC 2017-10-14T21:48:16 *** cboltz has quit IRC 2017-10-14T21:48:47 *** ldevulder has quit IRC 2017-10-14T21:49:14 *** ldevulder has joined #opensuse-admin 2017-10-14T22:01:55 *** Son_Goku has joined #opensuse-admin 2017-10-14T23:23:35 *** petracvv has quit IRC