2025-11-21T06:33:06 thank you sir! 2025-11-21T06:33:06 the where/what was about metrics.opensuse.org 2025-11-21T10:48:53 *** teepee_ is now known as teepee 2025-11-21T10:55:48 LubosKocman: metrics.o.o points to prg2 like the others you already identified 2025-11-21T11:28:54 can someone make me admin of https://lists.opensuse.org/manage/lists/obs-devel.lists.opensuse.org ? 2025-11-21T11:28:59 I want to close it down... 2025-11-21T12:14:19 henne: is your mailman account with your suse.de or opensuse.org email address (I find both in users) 2025-11-21T12:16:00 opensuse.org 2025-11-21T12:52:03 have fun 2025-11-21T17:56:03 Seeing some potential issues with id.o.o - had a report of some general slowness across multiple o.o sites that seemed related to something at the IdP, so I tried logging in there, and I got a gateway timeout after entering my username/password. 2025-11-21T17:57:16 https://id.opensuse.org/login/ldap is the URL that's giving the timeout message. 2025-11-21T18:01:01 Seeing this now 2025-11-21T18:32:33 Yeah, seems to be working again 2025-11-21T21:24:01 we often have monitoring complain that mysql replication for mybackup.i.o.o is broken/down "for more than 2 minutes", and 10 minutes later all is good again 2025-11-21T21:24:48 do we really need to have the alert after 2 minutes, or would it make sense to only alert if it's down for > 10 minutes so that we only get alerted about serious issues? 2025-11-21T21:38:19 cboltz: this is because replication is stopped by mysql-backupscript during backups (by design) and resumed afterwards. it confused me already too. we already have time_intervals with inhibit_rules for our automatic maintenance window. I suggest to add another window for backups inhibiting the replication alert during that time 2025-11-21T21:39:06 if it's frequently down outside of the intended backup period, it should be investigated 2025-11-21T21:41:46 I didn't keep the alert history - the last alert was a downtime around 6:00 AM, fixed at6:12 AM, which IIRC is quite typical 2025-11-21T21:46:52 you could check if mysql-backupscript.service was active on mybackup during that time 2025-11-21T21:47:06 well or if the machine was rebooted at that time 2025-11-21T21:48:29 that would mean lots of reboots ;-) 2025-11-21T21:50:29 backupscript starts at 03:00 and finishes between 4:43 and 5:30 (with an exception at 7:25) 2025-11-21T21:50:51 note: these are UTC times, while the alert mails are in CET 2025-11-21T21:51:26 Nov 21 03:00:00 mybackup systemd[1]: Starting Non-interactive MySQL backup... 2025-11-21T21:51:27 Nov 21 05:07:45 mybackup systemd[1]: Finished Non-interactive MySQL backup. 2025-11-21T21:53:30 5:07 UTC is 6:07 CET, so the "alert resolved" mail came at the right time 2025-11-21T21:53:44 matches IRC as well 2025-11-21T21:53:45 03:01 -- Notice(heroes-monitor): Alert: MySQL replication on instance mybackup.infra.opensuse.org is not running 2025-11-21T21:53:47 05:11 -- Notice(heroes-monitor): Resolved: MySQL replication on instance mybackup.infra.opensuse.org is not running 2025-11-21T21:56:41 so I guess we should silence the alert between 2:55 and 6:00? 2025-11-21T21:58:39 sounds reasonable (probably 03:00 or 02:59 would be sufficient too) 2025-11-21T21:59:35 *** teepee_ is now known as teepee 2025-11-21T22:00:38 we can either add adjust the alerting rule to make it not fire at all during that time, or adjust the inhibition rules to keep it firing but mute communication 2025-11-21T22:01:34 for updates I think we only mute email and keep dashboard+irc 2025-11-21T22:05:01 since it's intentionally down during the backup run, I'd tend to adjust the alerting rule to silence it everywhere 2025-11-21T22:05:53 but if you prefer to keep it on IRC - fine for me, I'm typically sleeping around the time it alerts on IRC, so it doesn't annoy me ;-) 2025-11-21T22:10:37 fair. nice instead of the time conditional would be to add some metrics to backupscript itself, so we could (besides monitoring the backups) just have the application alert be something like "if mysql_slave_status_slave_io_running == 0 and backup_is_running == 0" 2025-11-21T22:11:40 nice idea, but it's too perfect - if the backup script runs forever, we'll never notice 2025-11-21T22:13:42 ok, guess you would need a second alert about runtime being too high :P 2025-11-21T22:14:26 if you want it complicated ;-) 2025-11-21T22:30:07 https://gitlab.infra.opensuse.org/infra/salt/-/merge_requests/2668