2017-10-03T00:05:10 *** ChrisWi has quit IRC 2017-10-03T00:13:55 RECOVERY: HTTP keyserver 11371 on keyserver.infra.opensuse.org - HTTP OK: HTTP/1.1 200 OK - 3979 bytes in 0.001 second response time ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=keyserver.infra.opensuse.org&service=HTTP%20keyserver%2011371 2017-10-03T00:23:22 RECOVERY: HAProxy on anna.infra.opensuse.org - HAPROXY OK - static (Active: 3/3) mysql-int (Active: 1/1) smt (Active: 1/1) redmine (Active: 1/1) tarzan (Active: 1/1) conference (Active: 1/1) mickey (Active: 1/1) connect (Active: 1/1) studioexpress (Active: 1/1) etherpad (Active: 1/1) rpmlint (Active: 1/1) riesling (Active: 1/1) lists (Active: 1/1) mirrorlist (Active: 1/1) kernel-git-in (Active: 0/1) crashdb (Active: 1/1) nuka (Active: 1/1) keyserver-recon (Active: 0/1) kruemel (Active: 1/1) osccollab (Active: 1/1) gccstats (Active: 1/1) monitor (Active: 1/1) keyserver (Active: 1/1) download (Active: 1/1) community (Active: 1/1) icc (Active: 1/1) keyserver-db (Active: 0/1) status (Active: 1/1) freeipa (Active: 1/1) ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=anna.infra.opensuse.org&service=HAProxy 2017-10-03T00:24:22 RECOVERY: HAProxy on elsa.infra.opensuse.org - HAPROXY OK - download (Active: 1/1) keyserver-recon (Active: 0/1) mickey (Active: 1/1) static (Active: 3/3) crashdb (Active: 1/1) mirrorlist (Active: 1/1) riesling (Active: 1/1) lists (Active: 1/1) monitor (Active: 1/1) nuka (Active: 1/1) mysql-int (Active: 1/1) keyserver (Active: 1/1) conference (Active: 1/1) kernel-git-in (Active: 0/1) community (Active: 1/1) etherpad (Active: 1/1) studioexpress (Active: 1/1) tarzan (Active: 1/1) status (Active: 1/1) smt (Active: 1/1) freeipa (Active: 1/1) gccstats (Active: 1/1) rpmlint (Active: 1/1) keyserver-db (Active: 0/1) redmine (Active: 1/1) connect (Active: 1/1) osccollab (Active: 1/1) kruemel (Active: 1/1) icc (Active: 1/1) ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=elsa.infra.opensuse.org&service=HAProxy 2017-10-03T00:40:27 PROBLEM: Updates on chip.infra.opensuse.org - Updates CRITICAL : 1 security update(s): 2 package update(s): ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=chip.infra.opensuse.org&service=Updates 2017-10-03T00:44:58 PROBLEM: Updates on etherpad.infra.opensuse.org - Updates CRITICAL : 1 security update(s): 5 package update(s): ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=etherpad.infra.opensuse.org&service=Updates 2017-10-03T00:46:45 PROBLEM: Updates on mickey.infra.opensuse.org - Updates CRITICAL : 1 security update(s): 24 package update(s): ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=mickey.infra.opensuse.org&service=Updates 2017-10-03T00:47:20 PROBLEM: Updates on minnie.infra.opensuse.org - Updates CRITICAL : 1 security update(s): 2 package update(s): ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=minnie.infra.opensuse.org&service=Updates 2017-10-03T00:53:31 *** cboltz has quit IRC 2017-10-03T00:58:52 RECOVERY: Updates on chip.infra.opensuse.org - Updates OK : no updates available ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=chip.infra.opensuse.org&service=Updates 2017-10-03T00:59:12 RECOVERY: Updates on mickey.infra.opensuse.org - Updates OK : no updates available ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=mickey.infra.opensuse.org&service=Updates 2017-10-03T00:59:33 RECOVERY: Updates on minnie.infra.opensuse.org - Updates OK : no updates available ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=minnie.infra.opensuse.org&service=Updates 2017-10-03T01:00:26 RECOVERY: Updates on etherpad.infra.opensuse.org - Updates OK : no updates available ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=etherpad.infra.opensuse.org&service=Updates 2017-10-03T01:03:19 RECOVERY: rsync on community.infra.opensuse.org - OK: Rsync is up with 2 modules tested ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=community.infra.opensuse.org&service=rsync 2017-10-03T01:52:02 *** dddh_ has quit IRC 2017-10-03T02:22:57 *** okurz has quit IRC 2017-10-03T02:24:06 *** okurz has joined #opensuse-admin 2017-10-03T02:45:10 *** Son_Goku has quit IRC 2017-10-03T04:09:35 *** Fraser_Bell has joined #opensuse-admin 2017-10-03T04:09:35 *** Fraser_Bell has joined #opensuse-admin 2017-10-03T04:38:39 *** Fraser_Bell has quit IRC 2017-10-03T05:09:41 *** a-865-kx has quit IRC 2017-10-03T05:22:14 *** a-865-kx has joined #opensuse-admin 2017-10-03T06:54:21 *** okurz has quit IRC 2017-10-03T06:55:05 *** okurz has joined #opensuse-admin 2017-10-03T08:13:00 *** dddh__ has quit IRC 2017-10-03T09:22:38 DimStar: ping 2017-10-03T09:23:53 tampakrap: pong 2017-10-03T09:24:22 DimStar: conncheck-test.o.o should work now, can you give it a try please? 2017-10-03T09:25:18 tampakrap: only ipv4 (conncheck.o.o is also v6); but other than that, it replies valid 2017-10-03T09:26:28 * tampakrap checks if we have ipv6 2017-10-03T10:36:51 I broke weblate https://status.opensuse.org/incidents/25 2017-10-03T10:47:12 *** fvogt has joined #opensuse-admin 2017-10-03T10:57:08 DimStar: do you happen to know where are the db settings on osc-collab? 2017-10-03T10:59:17 it uses a local sqlite DB on the collab server directly 2017-10-03T11:00:35 ah nice 2017-10-03T11:00:39 can you check if it works please? 2017-10-03T11:02:26 meh - again fighting with my hero vpn :( 2017-10-03T11:02:50 just check the service 2017-10-03T11:02:57 osc collab todo <- this works for me 2017-10-03T11:03:00 it connected all seems fine 2017-10-03T11:03:05 good 2017-10-03T11:03:06 collab service from external seems fine 2017-10-03T11:03:30 just can't connect to the machine again for some reason using ssh 2017-10-03T11:04:35 try now 2017-10-03T11:05:41 debug1: Connecting to 192.168.254.28 [192.168.254.28] port 22. 2017-10-03T11:05:41 debug1: connect to address 192.168.254.28 port 22: No route to host 2017-10-03T11:05:41 ssh: connect to host 192.168.254.28 port 22: No route to host 2017-10-03T11:05:58 okay wait a couple of minutes for dns to update 2017-10-03T11:06:11 oh - the IP moved? 2017-10-03T11:06:12 it updated already here actually 2017-10-03T11:06:24 yes I am renumbering all of the VMs to 192.168.47.0/24 2017-10-03T11:07:10 ok, that worked 2017-10-03T11:09:47 *** nicolasbock_ has joined #opensuse-admin 2017-10-03T11:18:11 *** nicolasbock_ has quit IRC 2017-10-03T11:23:29 *** Son_Goku has joined #opensuse-admin 2017-10-03T11:31:00 *** nicolasbock_ has joined #opensuse-admin 2017-10-03T11:41:22 PROBLEM: NRPE on narwal3.infra.opensuse.org - CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=narwal3.infra.opensuse.org&service=NRPE 2017-10-03T12:15:31 *** cboltz has joined #opensuse-admin 2017-10-03T12:15:31 *** cboltz has joined #opensuse-admin 2017-10-03T12:20:36 RECOVERY: DNS on freeipa.infra.opensuse.org - DNS OK: 0.015 seconds response time. www.opensuse.org returns 130.57.66.6 ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=freeipa.infra.opensuse.org&service=DNS 2017-10-03T12:54:33 *** fvogt has quit IRC 2017-10-03T13:37:50 DimStar: yet another question, sorry for bugging you so much today 2017-10-03T13:37:58 do you know how rpmlint works? 2017-10-03T13:38:18 I am about to change its IP and I don't know how to see that it will be working fine afterwords 2017-10-03T13:38:31 afterwards* 2017-10-03T13:38:55 hmm.. I only know the frontend at rpmlint.opensuse.org 2017-10-03T13:39:52 who would know more and is not on public holiday today? 2017-10-03T13:42:24 hm.. the ';not on pub holiday' is kinda limitting it 2017-10-03T13:43:35 it probably has a local sqlite 2017-10-03T13:43:46 I'll renumber it and see what I broke 2017-10-03T13:43:52 if it doesn't work I'll revert the IP 2017-10-03T13:44:48 btw, this seems like an obs related service, shouldn't we move it to the suse-dmz vlan maybe? 2017-10-03T13:45:12 it does read-only actions against OBS 2017-10-03T13:45:32 so not business critical 2017-10-03T13:45:45 not really 2017-10-03T13:47:01 cool 2017-10-03T13:49:02 DimStar: it seems to work after the renumbering, can you keep an eye on it please if it updates info etc? 2017-10-03T13:49:12 sure thing 2017-10-03T14:08:18 cboltz: I just renumbered riesling/water, can you check please that they are fine? 2017-10-03T14:11:03 cboltz: the wiki seems up, but in water I don't know what to check 2017-10-03T14:13:56 oh, that's easy - just use the wiki search to check if water is working ;-) 2017-10-03T14:14:32 seems to work 2017-10-03T14:14:55 bonus points if you search for something mistyped ("opensus") to check that the "did you mean ..." feature also works 2017-10-03T14:15:38 yep it does 2017-10-03T14:15:55 I use hostnames in the config (no hardcoded IP, so it shouldn't care about the IP ;-) 2017-10-03T14:16:32 BTW: what's the reason for the IP changes? 2017-10-03T14:16:47 in your mail you only wrote that you'll do it, but not why 2017-10-03T14:17:02 * cboltz hopes there's more than "our monitoring was too green" ;-) 2017-10-03T14:17:06 we discussed it in some team meeting in the past that's why I didn't elaborate 2017-10-03T14:17:32 192.168.254.0/24 is the range we had on the old vlan (vlan42, currently used by obs and other suse-dmz services) 2017-10-03T14:17:58 so having the same range on two vlans is confusing and it causes trouble in eg dns entries 2017-10-03T14:18:16 ah, ok 2017-10-03T14:18:18 makes sense 2017-10-03T14:18:19 next step would be to stop using i.o.o suffix for the suse-dmz machines 2017-10-03T14:19:30 sounds like another chance to make the monitoring less green ;-) 2017-10-03T14:20:07 those machines are on the internal suse monitoring, nothing for you to worry about :) 2017-10-03T14:20:16 ;-) 2017-10-03T14:20:53 BTW: it seems the monitoring has a terribly long DNS timeout - for riesling, it says it's down since nearly two hours - and still shows the 192.168.254.7 IP 2017-10-03T14:21:22 I just fixed those entries, and I am now triggering it again 2017-10-03T14:22:15 better :-) 2017-10-03T14:22:30 I have two issues I can't solve myself though 2017-10-03T14:23:00 one is that I renamed gcc-stats.i.o.o (from gccstats.i.o.o) to reflect the machine's fqdn, and now all the checks are broken 2017-10-03T14:23:07 PROBLEM: NRPE on rpmlint.infra.opensuse.org - connect to address 192.168.47.53 port 5666: Connection refused ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=rpmlint.infra.opensuse.org&service=NRPE 2017-10-03T14:23:20 and also the multiple entries like this CRIT - Cannot get data from TCP port 192.168.254.101:6556: [Errno 111] Connection refused, execution time 0.0 sec 2017-10-03T14:23:51 I changed the configs (triple checked) on both the agent and the server, but there is some cache on the server that I don't know how to clear 2017-10-03T14:24:33 PROBLEM: NRPE on riesling.infra.opensuse.org - connect to address 192.168.47.42 port 5666: Connection refused ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=riesling.infra.opensuse.org&service=NRPE 2017-10-03T14:26:47 With our TTL of 5 minutes, I'd argue that a cache that lasts for two hours is clearly a bug 2017-10-03T14:29:17 PROBLEM: NRPE on water.infra.opensuse.org - connect to address 192.168.47.41 port 5666: Connection refused ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=water.infra.opensuse.org&service=NRPE 2017-10-03T14:33:10 there is a cache file that I found 2017-10-03T14:33:35 I even tried to manually change it without results 2017-10-03T14:33:53 feel free to take a look, I spent quite some time to try to fix it without luck 2017-10-03T14:35:31 I'm afraid I wouldn't even know where to look ;-) 2017-10-03T14:36:12 let's wait for lars to reply to my mail then 2017-10-03T14:39:38 connect to address 192.168.47.42 port 5666: Connection refused 2017-10-03T14:39:57 this is correct, there is nothing running on 5666 2017-10-03T14:39:59 and i have no idea why 2017-10-03T14:40:39 okay fixed with restart instead of reload 2017-10-03T14:41:25 RECOVERY: NRPE on riesling.infra.opensuse.org - NRPE v2.15 ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=riesling.infra.opensuse.org&service=NRPE 2017-10-03T14:41:42 yeah, sometimes simple things help a lot ;-) 2017-10-03T14:43:06 water also shows funny xinetd[2701]: bind failed (Cannot assign requested address (errno = 99)). service = nrpe messages 2017-10-03T14:43:18 I just restarted xinetd 2017-10-03T14:43:45 RECOVERY: NRPE on water.infra.opensuse.org - NRPE v2.15 ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=water.infra.opensuse.org&service=NRPE 2017-10-03T14:44:01 ^^ I triggered the check as well 2017-10-03T14:44:44 RECOVERY: NRPE on rpmlint.infra.opensuse.org - NRPE v3.1.1 ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=rpmlint.infra.opensuse.org&service=NRPE 2017-10-03T14:44:49 *** Ada_Lovelace has joined #opensuse-admin 2017-10-03T14:44:54 nice :-) 2017-10-03T14:45:12 Hi Sarah! 2017-10-03T14:45:18 worked for rpmlint as well, but not for narwal3 2017-10-03T14:45:25 totally confused 2017-10-03T14:45:25 Hi Christian! :) 2017-10-03T14:45:55 cboltz: Updates CRITICAL : At least one of your Repositories might be out of date. Please run "zypper refresh" as root to update it. 2017-10-03T14:45:59 on water 2017-10-03T14:46:07 6 days 2017-10-03T14:46:48 that's openSUSE:infrastructure:wiki - it didn't change since months, and now looks old 2017-10-03T14:47:17 I could trigger a package rebuild, but that sounds more like a workaround than like a real solution 2017-10-03T14:47:39 file a ticket then 2017-10-03T14:47:44 another possible workaround would be a dummy package (and to rebuild that every few months) 2017-10-03T14:48:22 I'd say this is an OBS issue - the metadata (which also includes the "expires" date) doesn't get updated if the repo content doesn't change 2017-10-03T14:49:06 what does the nagios check do? 2017-10-03T14:49:31 i would expect it to compare the repo metadata with the local one 2017-10-03T14:49:51 not just to check if the metadata is like two weeks old 2017-10-03T14:50:27 I don't know exactly 2017-10-03T14:50:43 but a simple "zypper lu" also shows a warning that the repo seems to be outdated 2017-10-03T14:50:59 so I hesitate to blame the monitoring check ;-) 2017-10-03T14:51:42 so the check just replicates this warning 2017-10-03T14:51:57 looks so, yes 2017-10-03T14:55:19 okay fair enough 2017-10-03T15:00:59 *** Son_Goku has quit IRC 2017-10-03T15:01:22 PROBLEM: Updates on freeipa.infra.opensuse.org - CHECK_UPDATES CRITICAL - 1 non-critical update available ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=freeipa.infra.opensuse.org&service=Updates 2017-10-03T15:02:39 Ada_Lovelace: around to help me with monitoring? 2017-10-03T15:08:56 cboltz: same update issue should be also on riesling right? 2017-10-03T15:09:59 yes, it uses the same additional repo 2017-10-03T15:09:59 it is not for some reason 2017-10-03T15:10:07 the check is missing? 2017-10-03T15:10:40 for some reason I don't know, the repo was disabled until maybe two hours ago 2017-10-03T15:10:49 maybe the zypper check doesn't run too often? 2017-10-03T15:11:03 (zypper lu shows the "outdated" warning as expected) 2017-10-03T15:22:50 RECOVERY: NRPE on narwal3.infra.opensuse.org - NRPE v3.1.1 ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=narwal3.infra.opensuse.org&service=NRPE 2017-10-03T15:23:49 okay fixed, mcaj helped! 2017-10-03T15:24:03 so finally the monitoring shows legit isues 2017-10-03T15:25:51 looks like my rescheduling of all checks on riesling made the outdated repo visible 2017-10-03T15:26:07 so now we see the expected warning 2017-10-03T15:26:09 nope, there was a config typo that was blocking updating the cache 2017-10-03T15:31:09 I still wonder if I should create a dummy package and trigger a rebuild of that every few months 2017-10-03T15:31:22 it's "not nice", but a warning in the monitoring also isn't nice ;-) 2017-10-03T15:31:36 I would say to raise a ticket about it so we can discuss it there 2017-10-03T15:32:12 *** Ada_Lovelace has quit IRC 2017-10-03T15:32:52 I can still open a ticket - but I'm quite sure it will take some time until the OBS team fixes this (by refreshing the metadata before it expires) 2017-10-03T15:33:01 do you want to keep the warnings until then? 2017-10-03T15:33:03 *** dddh_ has joined #opensuse-admin 2017-10-03T15:33:03 *** dddh_ has joined #opensuse-admin 2017-10-03T15:33:09 *** Ada_Lovelace has joined #opensuse-admin 2017-10-03T15:33:48 hello no 2017-10-03T15:33:58 hell* 2017-10-03T15:34:15 sounds like I should create that dummy package now ;-) 2017-10-03T15:35:59 cboltz: ssh daffy works for you? 2017-10-03T15:36:38 it asks for my password, so fetching the SSH key seems to be broken 2017-10-03T15:37:27 it doesn't ask my password here either 2017-10-03T15:37:30 it times out 2017-10-03T15:40:21 okay now it works 2017-10-03T15:40:36 cboltz: check please if login works for the wiki 2017-10-03T15:40:50 now I get "no route to host" for daffy :-/ 2017-10-03T15:41:09 IP changed to .47.x, wait a couple of mins for dns to catch up 2017-10-03T15:41:15 ah, ok 2017-10-03T15:41:18 ssh to riesling and water works 2017-10-03T15:41:35 the login to en.o.o check please 2017-10-03T15:42:07 works 2017-10-03T15:47:14 DimStar: confirmed, no ipv6 at provo 2017-10-03T16:25:04 *** Ada_Lovelace has quit IRC 2017-10-03T16:25:32 *** Ada_Lovelace has joined #opensuse-admin 2017-10-03T17:33:32 *** Son_Goku has joined #opensuse-admin 2017-10-03T17:38:28 *** Son_Goku has quit IRC 2017-10-03T17:42:38 *** Son_Goku has joined #opensuse-admin 2017-10-03T17:45:13 *** Ada_Lovelace has quit IRC 2017-10-03T17:48:17 PROBLEM: NRPE on status2.opensuse.org - CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds. ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=status2.opensuse.org&service=NRPE 2017-10-03T17:52:54 *** Ada_Lovelace has joined #opensuse-admin 2017-10-03T17:53:28 RECOVERY: NRPE on status2.opensuse.org - NRPE v3.1.1 ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=status2.opensuse.org&service=NRPE 2017-10-03T17:53:59 *** kl_eisbaer has joined #opensuse-admin 2017-10-03T17:58:12 RECOVERY: Updates on water.infra.opensuse.org - Updates OK : 1 package update(s): ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=water.infra.opensuse.org&service=Updates 2017-10-03T17:59:42 *** Son_Goku has quit IRC 2017-10-03T18:01:44 tampakrap: I get a "bad gateway" error for progress.o.o - known issue? 2017-10-03T18:01:55 nope, checking now 2017-10-03T18:02:18 the error page says Reason: DNS lookup failure for: proxy-opensuse-ha.login.infra.opensuse.org 2017-10-03T18:02:24 yep I see it 2017-10-03T18:03:59 Do we have our meeting today? 2017-10-03T18:04:12 I hope so - good evening ;-) 2017-10-03T18:04:22 Hi! :) 2017-10-03T18:04:24 yes, that's the plan ;-) 2017-10-03T18:04:32 damn, it was all working when I left the office 2017-10-03T18:04:51 All shit can happen... 2017-10-03T18:05:17 I can dig out our meeting topics from the mail notification, so that we can declare the progress outage as less urgent ;-) 2017-10-03T18:05:35 tampakrap: JFYI: monitoring works again - but you might want to have a look at the NTP configuration 2017-10-03T18:05:51 kl_eisbaer: yes, mcaj helped me fix it, it was a wrong configuration 2017-10-03T18:05:57 I saw that ntp is broken 2017-10-03T18:06:05 and now I need to fix freeipa sorry 2017-10-03T18:06:15 Today we had a lot of monitoring messages... 2017-10-03T18:06:53 might be because someone decided to migrate all machines to a new IP range without informing anyone first 2017-10-03T18:07:24 PROBLEM: HTTP wiki on riesling.infra.opensuse.org - HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 1371 bytes in 0.052 second response time ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=riesling.infra.opensuse.org&service=HTTP%20wiki 2017-10-03T18:07:27 http://paste.opensuse.org/31536330 is our temporary meeting ticket ;-) 2017-10-03T18:07:38 tampakrap: please note that this is the 2nd time now that you do something on machines that do not belong to you 2017-10-03T18:07:59 nice, "bad gateway" on the wiki :-( (also with a DNS failure) 2017-10-03T18:08:11 tampakrap: next time, I will refuse to work on any openSUSE stuff 2017-10-03T18:11:48 tampakrap: Can we start or do you need time for the other stuff? 2017-10-03T18:12:33 we can at least ask for community questions, so I'd say let's start 2017-10-03T18:12:45 (we can do a break afterwards if needed) 2017-10-03T18:12:47 Great! :) 2017-10-03T18:13:19 ok, so let me officially start the meeting ;-) 2017-10-03T18:13:23 welcome everybody! 2017-10-03T18:13:40 kl_eisbaer: apologies, that was not my intention at all 2017-10-03T18:13:42 the meeting topics are on http://paste.opensuse.org/31536330 2017-10-03T18:14:03 first - Questions and answers from the community 2017-10-03T18:14:03 I will spend some time to fix this, please start without me 2017-10-03T18:14:05 sorry :( 2017-10-03T18:14:22 is someone here who has questions? 2017-10-03T18:15:31 tampakrap: we are very forgiving if you invite us to dinner at the next heroes offsite meeting (whenever that is) *eg* 2017-10-03T18:17:12 doesn't look like someone has questions 2017-10-03T18:17:25 next topic - status reports about everything 2017-10-03T18:17:34 who wants to start? 2017-10-03T18:18:26 I can give some status information, if you like, but I'm not well prepared ;-) 2017-10-03T18:19:03 please start ;-) 2017-10-03T18:19:26 *** Ada_Lovelace has quit IRC 2017-10-03T18:19:45 Monitoring: we monitor 35 hosts now with > 700 services; sadly, we've 25 services we should look at 2017-10-03T18:19:49 *** Ada_Lovelace has joined #opensuse-admin 2017-10-03T18:20:15 https://software.opensuse.org is offline? status.opensuse.org doesn't have any information about why 2017-10-03T18:20:19 most of those are current NTP problems, after the IP migration - IMHO this is someting that should be fixable 2017-10-03T18:20:26 (sorry to interupt the meeting, oops) 2017-10-03T18:20:40 sysrich: IMHO tampakrap is on this 2017-10-03T18:20:50 sysrich: I'm updating now status.o.o 2017-10-03T18:20:52 danke! 2017-10-03T18:20:57 * sysrich returns to the chaos of his inbox 2017-10-03T18:21:02 our keyserver has currently a corrupted database - needs a check 2017-10-03T18:21:22 the 2nd status (status2.opensuse.org) is now in monitoring as well as the provo mirror 2017-10-03T18:21:37 any idea what broke the keyserver? 2017-10-03T18:21:39 the provo mirror is currently catching up with content from stage.o.o 2017-10-03T18:21:51 Thanks for fixing the provo mirror so fast. 2017-10-03T18:21:55 I guess one of the latest outages might be the cause 2017-10-03T18:22:04 Jürgen asked me because of it. 2017-10-03T18:22:15 as "removing power" (or simply destroying a VM) is never good for a database 2017-10-03T18:22:51 I'm currently planning to prepare the provo mirror as new stage.o.o in the next days. 2017-10-03T18:23:17 For that I need to install a complete mirrorbrain service on it - and get it completely synced 2017-10-03T18:23:48 next steps for status1 and status2 are synchronized (master -> master) databases 2017-10-03T18:24:11 I'm currently not sure if that will work out, but at least want to give it a try 2017-10-03T18:24:28 HA status pages? 2017-10-03T18:24:33 also the postfix on anna and elsa will be limited in the next days 2017-10-03T18:24:40 Ada_Lovelace: yes, that's the plan 2017-10-03T18:25:19 there is a lot of stuff to do for the downtime, stuff that belongs to a HA setup of everything that is currently running in Nuremberg and should be available in Provo 2017-10-03T18:25:35 another point here is a freeipa instance in Provo 2017-10-03T18:26:09 How fast do we receive the hardware setup in Provo? 2017-10-03T18:26:12 but the good thing is: I'm currently using this to enhance our monitoring ;-) 2017-10-03T18:26:42 Ada_Lovelace: the hardware is already there - it's just that we need to find someone in SUSE-IT who can fire up some VMs 2017-10-03T18:27:00 even MF-IT is currently very fast in changing firewall entries for us 2017-10-03T18:27:07 Really? That sounds good. 2017-10-03T18:27:29 Ada_Lovelace: status2, provo-mirror and one haproxy is running there already 2017-10-03T18:27:35 did you bring them cookies when you were in Provo last time? ;-) 2017-10-03T18:27:47 might be that tampakrap set up some additional machines already, I just did not check that 2017-10-03T18:27:55 cboltz: not only ;-) 2017-10-03T18:28:12 Who was able to change their processes? 2017-10-03T18:28:18 cboltz: btw: riesling reports a broken http server ... 2017-10-03T18:28:37 cboltz: ...and a critical pending security update :-) 2017-10-03T18:28:53 from the outside, I get the same "bad gateway" we see for lots of services 2017-10-03T18:29:03 and the security update is probably just waiting for the cronjob to run ;-) 2017-10-03T18:29:14 cboltz: that's why the monitoring is checking your server directly .... 2017-10-03T18:29:25 an rcapache2 restart might help ;-) 2017-10-03T18:29:44 but I'm not on the VM to verify that 2017-10-03T18:30:47 one problem with the IP range migration is - btw - that the database server permissions (which are based on IPs, as well as the ones for the mail servers and so on) are wrong 2017-10-03T18:31:27 I'm sorry, but I'm totally pissed of - again - that someone changes important settings on the machines that I maintain without informing me in front 2017-10-03T18:31:47 I can understand. 2017-10-03T18:32:08 That's a lot of configuration work... 2017-10-03T18:32:12 tampakrap: really, if you want to do all that stuff on your own, just tell me and I will do something else. I've enough to do than to bring openSUSE infra every two weeks back to live 2017-10-03T18:33:16 tampakrap: ...and JFYI, the monitoring checks are also depending on IP addresses: on the monitoring server AND the clients 2017-10-03T18:34:24 kl_eisbaer: no, I don't want to do everything by myself 2017-10-03T18:34:38 again, apologies for the mess and for the additional work you have to do because of my mistakes 2017-10-03T18:34:40 I found the reason for the HTTP error on riesling: "Cannot access the database" :-( 2017-10-03T18:34:56 tampakrap: ...and 195.135.221.150 is now missing in the keepalived.conf - why ? 2017-10-03T18:35:06 so this is another side effect of the IP changes 2017-10-03T18:35:06 cboltz: as I told you 2017-10-03T18:35:39 guys, I'm really totally p**ssed of - your behavior is really not professional 2017-10-03T18:35:58 You have to edit the ip address or use the hostname of the database server/ proxy. 2017-10-03T18:36:17 Ada_Lovelace: that's sadly just one side of the story 2017-10-03T18:36:20 kl_eisbaer: .150 is hydra, and I'm pretty sure it was never there 2017-10-03T18:36:30 don't forget firewall entries, server ACLs and other stuff 2017-10-03T18:36:35 I didn't remove any public IPs from keepalived 2017-10-03T18:37:31 tampakrap: right, so this might become a problem, we should talk about 2017-10-03T18:37:50 but I asume, at the moment you might be busy to fix all the stuff you broke 2017-10-03T18:38:31 so for me, there is just one question left for the meeting: what is the status of the power outage preparation ? 2017-10-03T18:39:00 I just saw the mail to announce - what happened with the news article on news.o.o ? 2017-10-03T18:39:10 Ada_Lovelace: I'll see if I have to change the config - at the moment the main problem is that "host mysql.infra.opensuse.org" gives a SERVFAIL :-( 2017-10-03T18:39:49 kl_eisbaer: I sent three mails to them, no reply yet, tomorrow that is not public holiday in nuremberg I am escalating that to my manager 2017-10-03T18:39:52 Then there can be a firewall problem with ips... 2017-10-03T18:39:57 kl_eisbaer: also wrote that on the ehterpad 2017-10-03T18:40:20 tampakrap: ok, thanks 2017-10-03T18:40:46 cboltz: If you really use the hostname 2017-10-03T18:40:56 cboltz: Ada_Lovelace is right: the firewall needs to be handeld as well as the ACL inside the database 2017-10-03T18:41:13 because your machine is now asking from a different IP ... 2017-10-03T18:41:17 yes, I use the hostname 2017-10-03T18:41:29 but I'll leave that up to our new database admin 2017-10-03T18:42:03 If someone wonders why he did not get sooo many mails from the monitoring at the moment: "Host or domain name not found. Name service error for name=relay.infra.opensuse.org" 2017-10-03T18:42:13 cboltz can't change database configs on the database server 2017-10-03T18:42:28 PROBLEM: HTTP on conference.infra.opensuse.org - HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 922 bytes in 0.025 second response time ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=conference.infra.opensuse.org&service=HTTP 2017-10-03T18:42:38 I could, but with the current situation, it won't help much 2017-10-03T18:42:42 ...and ideed: host relay.infra.opensuse.org 2017-10-03T18:42:42 Host relay.infra.opensuse.org not found: 2(SERVFAIL) 2017-10-03T18:43:03 someone messed up the DNS 2017-10-03T18:43:08 we'll still see "bad gateway" with the DNS failure - and when this gets fixed, mysql.infra.o.o should hopefully resolve again 2017-10-03T18:43:25 ok, thanks for your time. 2017-10-03T18:43:41 I've to leave now, my family is asking 2017-10-03T18:44:09 no problem, thanks for joining the meeting! 2017-10-03T18:44:19 Thanks for joining! 2017-10-03T18:44:30 Have fun with your children. 2017-10-03T18:44:39 maybe someone should think about why communication is important ... 2017-10-03T18:44:43 bye 2017-10-03T18:44:50 *** kl_eisbaer has left #opensuse-admin 2017-10-03T18:48:14 cboltz: Do we want to leave the meeting, too? We can't do anything during ip problems... 2017-10-03T18:50:07 I can point out some missing monitoring checks 2017-10-03T18:50:43 while lots of services are broken right now, the monitoring only complains on some minor issues like NTP issues 2017-10-03T18:50:54 the "outside view" is completely missing 2017-10-03T18:51:43 (for example a check if https://en.opensuse.org/special:version is reachable) 2017-10-03T18:51:56 is this something you could add? 2017-10-03T19:08:28 *** solevi|2 has joined #opensuse-admin 2017-10-03T19:23:32 *** fvogt has joined #opensuse-admin 2017-10-03T19:40:40 *** Ada_Lovelace has quit IRC 2017-10-03T20:00:04 RECOVERY: Updates on freeipa.infra.opensuse.org - CHECK_UPDATES OK - no updates available ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=freeipa.infra.opensuse.org&service=Updates 2017-10-03T20:15:29 PROBLEM: HTTP freeIPA on freeipa.infra.opensuse.org - connect to address 192.168.47.65 and port 443: Connection refused ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=freeipa.infra.opensuse.org&service=HTTP%20freeIPA 2017-10-03T20:15:30 PROBLEM: LDAP on freeipa.infra.opensuse.org - Could not bind to the LDAP server ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=freeipa.infra.opensuse.org&service=LDAP 2017-10-03T20:25:30 RECOVERY: HTTP freeIPA on freeipa.infra.opensuse.org - OK - Certificate freeipa.infra.opensuse.org will expire on Sun 10 Feb 2019 03:05:24 PM GMT +0000. ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=freeipa.infra.opensuse.org&service=HTTP%20freeIPA 2017-10-03T20:25:31 RECOVERY: LDAP on freeipa.infra.opensuse.org - LDAP OK - 0.008 seconds response time ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=freeipa.infra.opensuse.org&service=LDAP 2017-10-03T20:27:23 RECOVERY: HTTP wiki on riesling.infra.opensuse.org - HTTP OK: HTTP/1.1 301 Moved Permanently - 401 bytes in 0.074 second response time ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=riesling.infra.opensuse.org&service=HTTP%20wiki 2017-10-03T20:32:29 RECOVERY: HTTP on conference.infra.opensuse.org - HTTP OK: HTTP/1.1 200 OK - 23180 bytes in 0.200 second response time ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=conference.infra.opensuse.org&service=HTTP 2017-10-03T20:41:05 PROBLEM: Updates on riesling.infra.opensuse.org - Updates CRITICAL : 1 security update(s): 1 package update(s): ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=riesling.infra.opensuse.org&service=Updates 2017-10-03T20:42:14 *** Fraser_Bell has joined #opensuse-admin 2017-10-03T20:42:14 *** Fraser_Bell has joined #opensuse-admin 2017-10-03T20:43:01 PROBLEM: HAProxy on elsa.infra.opensuse.org - HAPROXY CRITICAL - Active service freeipa is DOWN on freeipa proxy ! ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=elsa.infra.opensuse.org&service=HAProxy 2017-10-03T20:43:02 PROBLEM: HAProxy on anna.infra.opensuse.org - HAPROXY CRITICAL - Active service freeipa is DOWN on freeipa proxy ! ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=anna.infra.opensuse.org&service=HAProxy 2017-10-03T20:43:06 PROBLEM: Updates on monitor.infra.opensuse.org - Updates CRITICAL : At least one of your Repositories might be out of date. Please run zypper refresh as root to update it. 1 security update(s): 1 package update(s): ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=monitor.infra.opensuse.org&service=Updates 2017-10-03T20:43:07 PROBLEM: Updates on minnie.infra.opensuse.org - Updates CRITICAL : 1 security update(s): 1 package update(s): ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=minnie.infra.opensuse.org&service=Updates 2017-10-03T20:43:08 PROBLEM: Updates on etherpad.infra.opensuse.org - Updates CRITICAL : 1 security update(s): 1 package update(s): ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=etherpad.infra.opensuse.org&service=Updates 2017-10-03T20:43:09 PROBLEM: Updates on keyserver.infra.opensuse.org - Updates CRITICAL : 1 security update(s): 1 package update(s): ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=keyserver.infra.opensuse.org&service=Updates 2017-10-03T20:43:10 PROBLEM: Updates on anna.infra.opensuse.org - Updates CRITICAL : 1 security update(s): 1 package update(s): ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=anna.infra.opensuse.org&service=Updates 2017-10-03T20:43:11 PROBLEM: Updates on mickey.infra.opensuse.org - Updates CRITICAL : 1 security update(s): 1 package update(s): ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=mickey.infra.opensuse.org&service=Updates 2017-10-03T20:43:12 PROBLEM: Updates on nuka.infra.opensuse.org - Updates CRITICAL : 1 security update(s): 2 package update(s): ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=nuka.infra.opensuse.org&service=Updates 2017-10-03T20:47:13 *** fvogt has quit IRC 2017-10-03T21:02:13 *** Fraser_Bell has quit IRC 2017-10-03T21:47:37 *** dddh_ has quit IRC 2017-10-03T22:14:51 *** dddh_ has joined #opensuse-admin 2017-10-03T22:14:51 *** dddh_ has joined #opensuse-admin 2017-10-03T22:49:38 *** cboltz has quit IRC 2017-10-03T23:44:12 *** solevi|2 has quit IRC 2017-10-03T23:50:44 *** Son_Goku has joined #opensuse-admin 2017-10-03T23:52:27 *** Son_Goku has quit IRC