2020-03-06T03:09:46 *** okurz_ is now known as okurz 2020-03-06T08:16:28 i think my vbulletin upgrade may have crashed mysql 2020-03-06T08:18:34 galera1 reports 'impossible to select state transfer donor. resource temp unavailable 2020-03-06T08:27:01 -heroes-bot- PROBLEM: HTTP wiki on riesling.infra.opensuse.org - HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 1381 bytes in 6.051 second response time ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=riesling.infra.opensuse.org&service=HTTP%20wiki 2020-03-06T08:40:29 pjessen: yup, wiki is down ;) 2020-03-06T08:41:53 -heroes-bot- PROBLEM: MySQL WSREP recv on galera3.infra.opensuse.org - CRIT wsrep_local_recv_queue_avg = 163.928994 ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=galera3.infra.opensuse.org&service=MySQL%20WSREP%20recv 2020-03-06T08:54:22 I think I'll ask Lars to allocate a separate disk for my VM, then I can use that for mysql. 2020-03-06T08:56:54 -heroes-bot- RECOVERY: HTTP wiki on riesling.infra.opensuse.org - HTTP OK: HTTP/1.1 301 Moved Permanently - 401 bytes in 0.051 second response time ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=riesling.infra.opensuse.org&service=HTTP%20wiki 2020-03-06T09:04:07 -heroes-bot- PROBLEM: MySQL WSREP recv on galera1.infra.opensuse.org - CRIT wsrep_local_recv_queue_avg = 511.226936 ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=galera1.infra.opensuse.org&service=MySQL%20WSREP%20recv 2020-03-06T09:04:52 seems it has recovered now 2020-03-06T09:08:44 kl_eisbaer: speaking of additional space, the space on mailman3.i.o.o will be really only useful for migration of many files from mlmmj, I hope to use pgsql server in the end if that's ok 2020-03-06T09:22:09 lcp: currently on baloo, the lists take up 14Gb, the archive is about 50Gb 2020-03-06T09:23:41 pjessen: alright, that's quite a lot 2020-03-06T09:24:08 twenty years worth of archive. 2020-03-06T09:24:42 isn't it more? there are some emails from the 90s too 2020-03-06T09:25:33 yeah, a few from 1998 maybe. 2020-03-06T09:26:06 the 14Gb for the list can be reduced to less than 10G - I see we have some unwatned archives in the wrong place 2020-03-06T09:28:16 I will need to have a look how mlmmj keeps the lists and archives at some point 2020-03-06T09:32:25 archiving is with mhonarc, the native mlmmj archiving was dropped 2020-03-06T09:48:19 kl_eisbaer: do we know what caused the galera issue? I hesitate to restart the vB upgrade, I dont want to cause another crash :-) 2020-03-06T09:57:49 pjessen: ah, alright, so mlmmj is only used for lists data? 2020-03-06T09:58:51 lcp: yup. 2020-03-06T09:59:36 lcp: the 14Gb I mentioned above is wrong, they include a lot of left-over archives. mlmmj really only takes up maybe 1-2Gb 2020-03-06T10:01:00 even better 😛 2020-03-06T10:14:21 pjessen: looking at mailman's support for mhonarc, we could use it with mailman as an archiver, although I would much prefer to use hyperkitty 2020-03-06T10:14:37 it is interesting to me that it is still supported tho 2020-03-06T11:15:49 oh yeah, we have mbox for everything, can just use that to import to hyperkitty 😛 2020-03-06T11:46:53 Interesting stats from our Galera Cluster... 2020-03-06T11:47:10 Data in MEMORY tables: 39.2M (Tables: 25) 2020-03-06T11:47:10 Data in MyISAM tables: 24.5M (Tables: 1) 2020-03-06T11:47:16 Data in InnoDB tables: 25.6G (Tables: 3180) 2020-03-06T11:47:28 Reads / Writes: 94% / 6% 2020-03-06T11:47:39 Total buffers: 10.6G global + 150.3M per thread (100 max threads) 2020-03-06T11:47:58 Slow queries (over 10s): 0% (1K/229M) 2020-03-06T11:48:12 Aborted connections: 0.51%  (60657/11856376) 2020-03-06T11:48:33 Joins performed without indexes: 3408236 -> bad, needs investigation 2020-03-06T11:48:47 Thread cache hit rate: 99% (121 created / 11M connections) 2020-03-06T11:49:11 Read Key buffer hit rate: 100.0% (5M cached / 96 reads) 2020-03-06T11:49:19 25G for a cluster of databases for the entire project? rookie numbers, we need to quickly fill it more 2020-03-06T11:49:36 InnoDB Read buffer efficiency: 100.00% (71792941432 hits/ 71794062838 total) 2020-03-06T11:49:41 InnoDB Write log efficiency: 98.06% (146926453 hits/ 149830198 total) 2020-03-06T11:49:47 InnoDB log waits: 0.00% (0 waits / 2903745 writes) 2020-03-06T11:50:24 Some of the new VB forum tables don't have a primary key - something to look at 2020-03-06T11:50:45 ...and sadly a lot of wikis as well :-/ 2020-03-06T11:52:15 and 151 certification failures since 29days :-/ 2020-03-06T11:53:22 so the overall performance is good - but we need to start digging into the wiki and forum databases, to check some smaller issues 2020-03-06T11:54:52 once the VB migration is done, I want to start tuning some general parameters as well, but these are just for tuning again (which brings not much more performance, as far as I can see) 2020-03-06T11:55:33 lcp: 25G for databases that contain mainly text as data (and no binary blobs) is not that bad, IMHO 2020-03-06T11:55:58 lcp: you might need some time to read all of it ;-) 2020-03-06T11:57:06 well, matrix will be a way to help with growing the database size ;) 2020-03-06T11:57:15 and mailman 2020-03-06T11:57:26 lcp: definitively. Do you want to use mysql for matrix ? 2020-03-06T11:57:43 pgsql, it doesn't support mysql 2020-03-06T11:58:01 puh! :-) I just checked mysql (galera) here :-) 2020-03-06T11:58:17 yeah, yeah, I know 2020-03-06T11:58:40 I don't have any services that would work with mysql planned 2020-03-06T11:58:55 I'll run an additional check and move the output somewhere, this might help in the future to detect analyze possible problems 2020-03-06T11:59:15 lcp: ...now we are getting friends :-) 2020-03-06T11:59:37 (and there is a pr that makes matrix use pgsql server too on gitlab) 2020-03-06T12:00:02 lcp: ah, gitlab, ... There was something with the runners, right? 2020-03-06T12:00:16 lcp: I still hope that our Gitlab gurus find the time to have a look 2020-03-06T12:00:32 yeah, something sudo related 2020-03-06T12:00:33 otherwise I will try my best to get some docker containers up and running 2020-03-06T12:01:07 but I'm not really a gitlab guru and could not find any documentation how the runners in our instance were setup 2020-03-06T12:01:33 yeah. that's one area where the docs are certainly missing 2020-03-06T12:02:44 there are a few more things that could use improvement, like the things that scripts otherwise do, but wouldn't do on an OS that's not supported by said scripts ;) 2020-03-06T12:03:13 I will write that up if I get the time to do that 2020-03-06T12:04:01 lcp: just start with tracking it somewhere, I would say. If you have it just in mind, you need to do it alone .... ;-) 2020-03-06T12:06:56 that's true 2020-03-06T12:07:46 lcp: at least it's a mistake I'm often doing ... 2020-03-06T12:08:00 ...and it parly relates to the "no documentation" problem 2020-03-06T12:09:39 it would be good if there was just documentation for everything that has to be done to setup everything, but that's not particularly possible 2020-03-06T12:11:01 lcp: agree. Especially as we often have stuff (like osc-collab, software.o.o) up and running that is running under DevOPS principles: everyone is playing along until its somehow working... 2020-03-06T12:12:29 the unfortunate reality is that we can't salt everything, because obviously not everybody is willing to work in that environment 2020-03-06T12:12:53 agree 2020-03-06T12:15:11 My main concern is currently the setup of the salt-master. I am used to "play along" with small changes, that normally should not really harm. If I am forced to setup an own test-environment for this (with salt-master and test-machines), without getting much help/documentation, my motivation is not that high... 2020-03-06T12:15:55 we could have a vagrant setup for this, to make this easier for people 2020-03-06T12:16:17 setting up a VM for testing is not the problem, but there is no information how a "test salt-master" should look like. cbolz started with a documentation in the wiki, but this is not complete, yet. 2020-03-06T12:16:26 I am saying this without actually having any experience with vagrant outside of actually enjoying using it 2020-03-06T12:16:32 yes, this would definitely help 2020-03-06T12:25:18 *** Martchus_ is now known as Martchus 2020-03-06T14:01:52 -heroes-bot- PROBLEM: MySQL WSREP recv on galera3.infra.opensuse.org - CRIT wsrep_local_recv_queue_avg = 945.833285 ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=galera3.infra.opensuse.org&service=MySQL%20WSREP%20recv 2020-03-06T14:44:17 i cannot believe how long this vb upgrade takes .... hours and hours and hours. 2020-03-06T14:45:16 this old of a version? thousands of migrations? 2020-03-06T14:46:35 possibly, yes. I started over again this morning at 0800. Still running. 2020-03-06T14:47:32 8ish hours? when did it fail last time? 2020-03-06T14:55:25 *** ldevulder__ is now known as ldevulder 2020-03-06T14:58:05 well, it was running last night until 2100, then failed. then I decided to run it only internally. last creash was when galera gave up this morning. 2020-03-06T15:00:32 what a nightmare, jeez 2020-03-06T15:08:26 pjessen, :( so will it take long with a final db dump? Just thinking of a timeline for the forum to be off-line for the final backup from MF, transfer and import? 2020-03-06T15:09:10 8h maintenance window, eh? 2020-03-06T15:09:41 lcp whatever it takes... 1 day. 2 days etc... 2020-03-06T15:09:55 I can prepare for you a nice html thing to show for that time on the site when that's happening 2020-03-06T15:10:13 lcp, that would be perfect 2020-03-06T15:11:10 lcp, pjessen just that it's probably time to start putting up a notice of the pending move 2020-03-06T15:11:33 sure, give me half an hour 😛 2020-03-06T15:11:37 or an hour 2020-03-06T15:12:06 lcp, no rush, will the URL stay the same? 2020-03-06T15:13:06 as in https://forums.opensuse.org or something new? 2020-03-06T15:13:49 that's the plan atm afaik 2020-03-06T15:51:58 kl_eisbaer: hm, I wonder if after that database issue, maybe we could introduce some testing infra in salt, so there would be test profiles and every vm id could opt to have itself tested based on those profiles (the testing would be done in the status vm tho) 2020-03-06T15:58:08 lcp: sure, just go ahead, I would say. 2020-03-06T15:59:09 now I have to think where to begin, because it sounds great, implementation is a little harder 😉 2020-03-06T15:59:16 * kl_eisbaer is just waiting to get some dedicated openSUSE heroes hardware and AWS account, which allows every Heros to fire up new machines on demand... 2020-03-06T15:59:55 lcp: this is like every thing that has the potential to become a positive burner :-) 2020-03-06T16:05:38 malcolmlewis: the upgrade finished just before 1700 - yet, the upgrade with the final export will take just as long. 2020-03-06T16:06:38 9 hours 2020-03-06T16:06:57 why vb, why 2020-03-06T16:07:33 peek previews at http://forum.infra.o.o 2020-03-06T16:07:36 kl_eisbaer: for services like matrix and obs, we can probably just test if apis are responding to the request every 10 minutes? 2020-03-06T16:07:52 still one error "fetch forums". 2020-03-06T16:08:47 I have to report a very important bug 2020-03-06T16:08:58 forumsadmin uses the old openSUSE logo 2020-03-06T16:09:04 the one from 15 years ago 2020-03-06T16:09:32 before my time :-) 2020-03-06T16:12:55 lcp: testing APIs can be done by haproxy and/or the monitoring 2020-03-06T16:13:50 Topics: 0; Posts: 0; Members: 40,501; Active Members: 1,116 => really no posts? 2020-03-06T16:14:55 -heroes-bot- PROBLEM: PSQL locks on mirrordb1.infra.opensuse.org - POSTGRES_LOCKS CRITICAL: DB postgres total locks: 63 ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=mirrordb1.infra.opensuse.org&service=PSQL%20locks 2020-03-06T16:17:29 kl_eisbaer: haproxy? 2020-03-06T16:19:46 (I'm not asking what haproxy is, but how would haproxy do that) 2020-03-06T16:21:06 well: haproxy has some simple check commands you might customize 2020-03-06T16:22:12 but I have to admit that this highly depends on what you want to achieve. Just to check if an API responds can be done via a simple http request - like we do already for all webservers 2020-03-06T16:22:34 for MySQL there are some checks that even allow to execute SQL queries 2020-03-06T16:23:17 the simplest test is obviosly a tcp test, which just tests for a TCP response .... 2020-03-06T16:24:00 all this is just done inside haproxy to allow to switch to the maintenance mode if a backend fails 2020-03-06T16:28:13 kl_eisbaer: well, frontend might load, that doesn't really say anything about its actual performance 2020-03-06T16:28:31 kl_eisbaer: not quite there yet. 2020-03-06T16:29:11 its very slow, and still some long running query active in the background. 2020-03-06T16:29:54 an whilst I have been busy pulling my hair out, the next version is available ..... 2020-03-06T16:33:04 was, now it fails to load 2020-03-06T16:36:22 yeah I know. 2020-03-06T16:36:46 and it is slooooooooow 2020-03-06T16:37:13 I think it will be easy to convince you to move to something else 😉 2020-03-06T16:37:36 oh yes, I'm not volunteeering for this circus again 2020-03-06T16:38:31 the worst is really these 8-9 hours to upgrade, you make one mistake and you have lost a full day 2020-03-06T16:39:21 yeah, I feel you 2020-03-06T16:40:20 hi malcolm 2020-03-06T16:41:00 -heroes-bot- PROBLEM: HTTP wiki on riesling.infra.opensuse.org - HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 1381 bytes in 6.049 second response time ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=riesling.infra.opensuse.org&service=HTTP%20wiki 2020-03-06T16:50:32 umm, galera gaving trouble again? 2020-03-06T16:50:54 -heroes-bot- RECOVERY: HTTP wiki on riesling.infra.opensuse.org - HTTP OK: HTTP/1.1 301 Moved Permanently - 401 bytes in 0.056 second response time ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=riesling.infra.opensuse.org&service=HTTP%20wiki 2020-03-06T17:09:01 -heroes-bot- PROBLEM: MySQL WSREP recv on galera2.infra.opensuse.org - CRIT wsrep_local_recv_queue_avg = 1.726594 ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=galera2.infra.opensuse.org&service=MySQL%20WSREP%20recv 2020-03-06T17:17:02 kl_eisbaer, should be Threads 143,956, Posts 1,068,594, Members 40,558 and Active Members 1,130 2020-03-06T17:18:48 pjessen, what about getting the dump to you, sounds like it might be one or two days? 2020-03-06T17:21:11 I am afraid forums will have to be closed for a day if this is the situation though 2020-03-06T17:29:07 malcolmlewis: I think the database export ought to work fine the next time. Anyway, authentication has yet to be resolved. 2020-03-06T17:34:39 * lcp uploaded an image: Screenshot from 2020-03-06 18-34-20.png (6KB) < https://matrix.org/_matrix/media/r0/download/matrix.org/ygyADoxZrBDhddXYnbPzVblu > 2020-03-06T17:34:42 this is perfect 2020-03-06T17:34:56 -heroes-bot- RECOVERY: PSQL locks on mirrordb1.infra.opensuse.org - POSTGRES_LOCKS OK: DB postgres total=37 ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=mirrordb1.infra.opensuse.org&service=PSQL%20locks 2020-03-06T17:42:07 lcp, that's not an issue, folks will survive ;) ML's and IRC are still there... 2020-03-06T17:43:14 pjessen, yes, once you have the test system up and running, that will be the next hurdle... 2020-03-06T17:46:38 malcolmlewis: I think the same day I can ddos irc and replace mlmmj with mailman3, just so people don't have a place >:D 2020-03-06T17:48:15 lcp: evil 2020-03-06T17:48:36 malcolmlewis: do you have acces to infra.o.o ? 2020-03-06T17:49:51 it looks like forum.infra.o.o might be running, with proper (albeit outdated) numbers, forums, post etc. 2020-03-06T17:50:25 encoding is just broken, but it sure works 2020-03-06T17:50:40 lcp: YES! 2020-03-06T17:51:30 I had to sacrifice two goats though. floor is still a bit sticky here. 2020-03-06T17:52:10 encoding is bizarre tbh, it works fine in some places 2020-03-06T17:52:54 cyrillic works in headers but not in forum titles? 2020-03-06T17:52:57 lcp: probably an issue with charset in the database. if it keeps running, I might do some conversions. 2020-03-06T17:53:24 the database is utf8mb4, recommended by vb, but our old tables are still latin1 2020-03-06T17:54:40 anyway, I am deliriously happy you can browse and find problems! 2020-03-06T17:55:06 ah yeah 2020-03-06T17:56:19 also a problem with encoding for German. 2020-03-06T17:58:09 btw, I managed to sneak in the latest upgrade to 5.6. 2020-03-06T18:02:30 nice 2020-03-06T18:02:43 I assume it takes less than 9 hours to complete? >:D 2020-03-06T18:22:08 lcp: it was FAST ... only crashed galera once, very well behaved. 2020-03-06T18:24:58 you know, discourse is ror with pgsql and redis backend 2020-03-06T18:25:13 and the frontend is static js 2020-03-06T18:26:10 actually don't quote me on that frontend, it might be more than just static js 2020-03-06T18:57:01 -heroes-bot- PROBLEM: HTTP wiki on riesling.infra.opensuse.org - HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 1381 bytes in 6.041 second response time ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=riesling.infra.opensuse.org&service=HTTP%20wiki 2020-03-06T18:59:21 i hate this sh .... now it's stopped working. 2020-03-06T19:00:34 lcp: main only concern is the continuity of the forums, once we've migrated to nuernberg, anyone is free to migrate to whatever. 2020-03-06T19:01:04 oh, and it's working again. an amazing piece of software. 2020-03-06T19:01:08 yeah, I know 2020-03-06T19:01:15 I'm just joking 2020-03-06T19:01:48 lcp: well, you're right too, but it's just not my itch to scratch :-) 2020-03-06T19:02:24 I know how much of a job switching arround accounts system will be, or mailing lists 2020-03-06T19:02:36 so like, the things I am currently doing >:D 2020-03-06T19:07:00 * lcp posted a file: maintenance.html (5KB) < https://matrix.org/_matrix/media/r0/download/matrix.org/TkPPWAXVCzCXebcRsKvQhmHT > 2020-03-06T19:07:17 there you go, nice screen for the time of maintenance >:D 2020-03-06T19:08:13 lcp: :-) 2020-03-06T19:14:42 pjessen, lcp yes continuation of the forums is a must, once up and running then can look for alternatives and just archive it off as read only 2020-03-06T19:16:35 pjessen, nope no access to infra.o.o 2020-03-06T19:16:53 malcolmlewis: won't need to archive anything https://github.com/discourse/discourse/blob/master/script/import_scripts/vbulletin5.rb ;) 2020-03-06T19:17:02 malcolmlewis: sorry, I don't dare switch it over atm 2020-03-06T19:17:08 but another migration obviously >:D 2020-03-06T19:18:19 pjessen, no for sure.... no switching, need to look at the authentication side once you have the test system running... 2020-03-06T19:19:49 lcp, pjessen so set up a discourse instance and test? Does it offer private forums for admins? 2020-03-06T19:20:44 I might do that, let's finish having vbulletin one set up first 2020-03-06T19:20:49 malcolmlewis: I think we already have a test instance, courtesy of darix. 2020-03-06T19:20:50 and I will look up private forums 2020-03-06T19:20:59 ah do we? 2020-03-06T19:21:40 all, I have just sent out a migration update for today by email, I think we have some progress. it's slow, but moving. 2020-03-06T19:22:00 tomorrow I'll put together a proposal for the move to production. 2020-03-06T19:22:17 at this rate it will be ready by tomorrow 😛 2020-03-06T19:23:04 pjessen, sweet :) 2020-03-06T19:23:24 guys, I'm signing off - gotta go see if I can grow my hair back. 2020-03-06T19:23:36 pjessen, thanks :) 2020-03-06T19:28:09 malcolmlewis: https://meta.discourse.org/t/invite-only-closed-groups/78120 2020-03-06T19:29:09 pjessen: I'm happy to see the progress with the forums :-) 2020-03-06T19:29:36 one thing I just noticed is which means it looks, well, minimalistic for people without heroes VPN access ;-) 2020-03-06T19:30:13 yeah, probably still didn't find the setting for setting the address to the correct one 2020-03-06T19:30:16 >:D 2020-03-06T19:30:51 I have a feeling that it will be one of the easier problems to fix ;-) 2020-03-06T19:31:49 hi, can somebody here give some hints if SAP knowledge are really mandatory for this job? https://jobs.suse.com/us/en/job/7014102/Solution-Architect-SAP-Linux-and-Automation 2020-03-06T19:32:38 hm, I want to switch over our services to using libravatar instead of gravatar, seems like a lot of work tho 2020-03-06T19:33:29 IonutVan_: I would assume so, it's in the name 2020-03-06T19:42:45 lcp, I am asking because they say: "Technical understanding of at least one of the following technologies required, additional technologies are a plus:" 2020-03-06T19:44:59 I mean, it doesn't hurt you to try and apply anyway 2020-03-06T19:47:05 I am also looking at this one: https://jobs.suse.com/us/en/job/7013602/Cloud-Engineer-SRE but I am not sure if could be any discussion to work from home and travel ocassionaly to Prague :D 2020-03-06T19:47:30 occasionally* 2020-03-06T19:48:33 looks like galera is having trouble again :-( - wikis are down with database problems 2020-03-06T20:00:03 -heroes-bot- PROBLEM: HAProxy on anna.infra.opensuse.org - HAPROXY CRITICAL - Backup service galera1 is DOWN on galera proxy ! Active service galera2 is DOWN on galera proxy ! ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=anna.infra.opensuse.org&service=HAProxy 2020-03-06T20:00:06 -heroes-bot- PROBLEM: HAProxy on elsa.infra.opensuse.org - HAPROXY CRITICAL - Active service galera2 is DOWN on galera proxy ! Backup service galera1 is DOWN on galera proxy ! ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=elsa.infra.opensuse.org&service=HAProxy 2020-03-06T20:41:03 IonutVan_: that cloud sre position is for my team and we would prefer someone who can chill with us in Prague daily 2020-03-06T20:41:42 theluckymike, thanks a lot 2020-03-06T20:41:55 but it seems that is not only cloud related, right 2020-03-06T20:41:56 ? 2020-03-06T20:42:02 just curious - how many hours of chilling does the job include, and how many hours of actual work? ;-) 2020-03-06T20:42:13 right now about 0% chilling :D 2020-03-06T20:42:27 mention "replace faulty hardware" 2020-03-06T20:42:48 theluckymike, I will be interested if is remote :) 2020-03-06T20:43:01 I am not too far from Prague :D but anyway :) 2020-03-06T20:43:32 its not remote tho :/ 2020-03-06T20:43:53 not sure of replacing faulty hw, most of job would be in cloud indeed, our team has very little involvemt with actlau hw 2020-03-06T20:48:59 theluckymike, thanks 2020-03-06T20:54:06 ARGS: did anyone restart two galera servers at once? 2020-03-06T20:54:37 * cboltz only looked at them (systemctl status mysql), but doesn't dare to restart them 2020-03-06T20:54:43 * lcp didn't touch a single galera server 2020-03-06T20:55:19 (besides that, sudo doesn't work for me on galera[23]) 2020-03-06T20:55:43 cboltz: maybe because there is no role for it? 2020-03-06T20:55:54 - sorry, looking into what happened right now 2020-03-06T20:56:27 ok: 2nd machine is back 2020-03-06T20:56:45 still need to check the sync status of the 3rd one... 2020-03-06T20:57:39 eh, galera is doing great today, ngl 2020-03-06T20:58:43 galera2 is currently donor for galera1 and galera3 - this might need some time: 46G to transfer ... 2020-03-06T20:58:53 2 times ... 2020-03-06T20:59:05 nice... 2020-03-06T20:59:05 while per's VB is still hammering 2020-03-06T20:59:55 pgsql backed discourse sure is looking more appealing by the second 2020-03-06T21:00:06 am I the only one who thinks we should set up a separate mysql server only for VB? 2020-03-06T21:00:13 => "Slave SQL: Error 'Duplicate column name" for many tables from the VB 2020-03-06T21:00:30 no, it was suggested by per today 2020-03-06T21:00:51 lcp: have a look at the vacuum problems of mirrordb1 and mirrordb2. PostgreSQL is not much better in performance ;-) 2020-03-06T21:00:51 actually, I don't remember what happened after that 2020-03-06T21:01:02 kl_eisbaer: shhhh 2020-03-06T21:01:07 way better 2020-03-06T21:01:37 Internal MariaDB error code: 1146 and Internal MariaDB error code: 1060 2020-03-06T21:01:48 spamming the logs... 2020-03-06T21:01:51 performance? at least it doesn't crash 2020-03-06T21:02:02 all related to some alter table commands from the VB migration 2020-03-06T21:02:27 lcp: the master (galera2) did also not crash - just the slaves did not catch up in time with all the errors 2020-03-06T21:03:07 2020-03-06 16:12:22 43 [ERROR] WSREP: Failed to apply trx: source: 1611474b-4843-11ea-9772-3a02cf11044f version: 4 local: 0 state: APPLYING flags: 1 conn_id: 12017513 trx_id: 50024938 seqnos (l: 233429, g: 8600854, s: 8600853, d: 8600754, ts: 7006597023902198) 2020-03-06T21:03:07 2020-03-06 16:12:22 43 [ERROR] WSREP: Failed to apply trx 8600854 4 times 2020-03-06T21:03:07 2020-03-06 16:12:22 43 [ERROR] WSREP: Node consistency compromised, aborting... 2020-03-06T21:03:07 2020-03-06 16:12:22 43 [Note] WSREP: Closing send monitor... 2020-03-06T21:03:07 2020-03-06 16:12:22 43 [Note] WSREP: Closed send monitor. 2020-03-06T21:03:08 2020-03-06 16:12:22 43 [Note] WSREP: gcomm: terminating thread 2020-03-06T21:03:08 2020-03-06 16:12:22 43 [Note] WSREP: gcomm: joining thread 2020-03-06T21:03:09 2020-03-06 16:12:22 43 [Note] WSREP: gcomm: closing backend 2020-03-06T21:03:19 ^^ this is from galera1 2020-03-06T21:03:25 well then, I would like to publicly apologize to mysql for misjudging its capabilities of not crashing 2020-03-06T21:03:54 and behind galera1 is the backup machine, which is a slave of galera1 2020-03-06T21:04:27 lcp: just provide a working multi-master PostgreSQL setup and we can start to migrate 2020-03-06T21:04:48 I guess most, if not all, current apps can also use PostgreSQL 2020-03-06T21:05:03 sure... 2020-03-06T21:05:07 lcp: but please a less error prone and stable one than galera :-) 2020-03-06T21:05:53 do I get a vm for that? 2020-03-06T21:06:04 lcp: really just one ? 2020-03-06T21:06:15 for me, HA setups normally start with 3 nodes ... 2020-03-06T21:06:22 you are right, but also I want it on some other exotic OS 2020-03-06T21:06:42 we have sle, leap and centos already 2020-03-06T21:06:50 lcp: hey: we should improve openSUSE - and not misuse broken stuff to migrate everything away from it ;-) 2020-03-06T21:07:23 I am one of Leap devs, but I only officially do branding stuff for it ;) 2020-03-06T21:08:46 looks like the systemd start timeout hit galera1 .... 2020-03-06T21:08:59 the node already noticed that it was too much behind galera2 2020-03-06T21:09:11 ...and decided to restart and do a fresh sync 2020-03-06T21:09:53 but as the data transfer took too long, systemd interrupted the transfer ... 2020-03-06T21:10:07 * kl_eisbaer is setting TimeoutSec=7200 now 2020-03-06T21:11:31 I should probably see if that patch I did to discourse works 2020-03-06T21:15:25 wait, production setup is docker based, but development is based entirely on system packaging? what is this 2020-03-06T21:16:26 lcp: ? 2020-03-06T21:16:58 * kl_eisbaer don't think lcp is speaking about the DB clusters.... 2020-03-06T21:18:24 ok: back to wsrep_cluster_size=3 2020-03-06T21:18:50 still waiting to get all nodes in sync again 2020-03-06T21:19:13 nope, I'm trying to run a development environment for discourse the way you would typically run production >:T 2020-03-06T21:19:35 lcp: ah, ok :-) 2020-03-06T21:19:42 and I fundamentally disagree with that being a sane way to do things 2020-03-06T21:20:13 their production environment is setup the way you would run development environment on the other hand, sooo 2020-03-06T21:20:26 come on, it would be boring if you do development and testing in the same way you run production ;-) 2020-03-06T21:21:04 one, what about this position? somebody here knows something about it? I guess that's not remote either :) https://jobs.suse.com/us/en/job/7014304/Manager-Quality-Engineering 2020-03-06T21:21:06 right. This is something for newbees. Only real admins now how to run production. 2020-03-06T21:21:54 you mean, for example, how to run a galera cluster? *g,d&r* 2020-03-06T21:22:42 IonutVan_: SUSE is getting more flexible when it comes to local vs remote positions these days. But this depends more on the hiring manager than the need for the position. So the best might be to ask the hiring crew directly. 2020-03-06T21:22:58 cboltz: somewhat of :-) 2020-03-06T21:23:32 cboltz: looking at software-o-o you can 2020-03-06T21:23:55 but software-o-o is a simple application, I would expect it to be fairly simple to develop and deploy 2020-03-06T21:24:07 -heroes-bot- PROBLEM: MySQL WSREP recv on galera1.infra.opensuse.org - CRIT wsrep_local_recv_queue_avg = 14.459459 ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=galera1.infra.opensuse.org&service=MySQL%20WSREP%20recv 2020-03-06T21:24:18 however discourse's setup is way easier for production than for development, and that annoys me 2020-03-06T21:25:25 as the greatest Steve Balmer once said: DEVELOPERS DEVELOPERS DEVELOPERS DEVELOPERS 2020-03-06T21:26:00 kl_eisbaer, I will, thanks a lot 2020-03-06T21:26:05 https://www.youtube.com/watch?v=KMU0tzLwhbE 2020-03-06T21:26:47 lcp: I agree it sounds annoying, but having it the other way round would be worse (poor admins! ;-) 2020-03-06T21:27:09 no, I already expect myself to suffer when deploying anything 2020-03-06T21:27:38 I might develop something 5 times a week, I will be deploying software once no matter what 2020-03-06T21:28:10 that's not always true, but you know what I mean 2020-03-06T21:28:10 ..and pontifex aka download.o.o running full .... 2020-03-06T21:28:52 hmpf: did I tell you that I wanted to enjoy my weekend since 1 hour? 2020-03-06T21:29:00 Now I see: "WSREP: Failed to report last committed 8605065, -77 (File descriptor in bad state)" 2020-03-06T21:29:33 great :-/ 2020-03-06T21:31:06 I'll reset galera2 - looks like the machine did not like that I restarted mysql so early (after 30min sync issues) 2020-03-06T21:32:31 at least everything switched over to galera3 already 2020-03-06T21:33:55 in case it helps - setting up a separate mysql server for the forums shouldn't be hard (using mysql-formula it's actually boring - been there, done that (it's one of my test VMs). Obviously we'll still need to do some config tuning) 2020-03-06T21:34:34 cboltz: I think: using a dedicated mysql server during the migration might be the best idea. 2020-03-06T21:34:47 once the migration is done, we could transfer the DB over to the production cluster 2020-03-06T21:35:26 sounds like a good idea 2020-03-06T21:35:27 but currently I see so many broken "ALTER TABLE" and other stuff comming either from the migration script or VB itself - that I would not recommend to do the migration on the production cluster 2020-03-06T21:36:05 if you give me an empty VM, I can handle/salt the mysql setup 2020-03-06T21:36:25 nobody wants me - anybody just wants my VMs ... 2020-03-06T21:36:48 how much size? 2020-03-06T21:37:07 good question, how big are the VB databases on galera? 2020-03-06T21:37:19 * kl_eisbaer did not check 2020-03-06T21:37:24 give me a second... 2020-03-06T21:37:26 I just created a 30G rootfs 2020-03-06T21:38:15 du -h /var/lib/mysql/webforums on galera1 says 3.7G, so 30G should be more than enough 2020-03-06T21:43:07 forumsoo_temp.infra.opensuse.org 2020-03-06T21:43:17 cboltz: where can I find your ssh key ? 2020-03-06T21:43:25 in freeipa 2020-03-06T21:43:35 merci 2020-03-06T21:43:57 ed25519 is ok? or do you want both? 2020-03-06T21:44:13 yes 2020-03-06T21:44:23 silly question from me 2020-03-06T21:44:36 no, not a silly question 2020-03-06T21:44:55 but maybe I should delete the other one - 15.x VMs won't accept it anymore because of the old format 2020-03-06T21:45:35 (the "yes" was meant for ed25519) 2020-03-06T21:46:57 ^^ and this is why my question was silly: your yes is valid for both options ;-) 2020-03-06T21:47:28 cboltz: please try " ssh root@forumsoo_temp.infra.opensuse.org " 2020-03-06T21:47:41 works :-) 2020-03-06T21:48:11 ok. The machine just needs one reboot and after that its ready. (I was too lazy to use the latest image from OBS and missed the systemd update...) 2020-03-06T21:48:33 no problem, I'll do the reboot 2020-03-06T21:48:43 hehe too late ;-) 2020-03-06T21:48:45 actually not, you were faster ;-) 2020-03-06T21:48:50 back online 2020-03-06T21:49:08 enjoy and remember to have a lot of fun :-) 2020-03-06T21:49:27 I will, thanks for creating the VM quickly! 2020-03-06T21:50:06 well, I'm no cloud robot - and had to fight with a wrong DNS entry (in the opensuse.org zone instead of infra.opensuse.org) - but I hope the service is ok for you ;-) 2020-03-06T21:50:26 more than ok ;-) 2020-03-06T21:52:48 -heroes-bot- RECOVERY: MySQL WSREP recv on galera1.infra.opensuse.org - OK wsrep_local_recv_queue_avg = 0.342622 ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=galera1.infra.opensuse.org&service=MySQL%20WSREP%20recv 2020-03-06T21:53:23 -heroes-bot- PROBLEM: MySQL WSREP recv on galera2.infra.opensuse.org - CRIT wsrep_local_recv_queue_avg = 720.284175 ; See https://monitor.opensuse.org/icinga/cgi-bin/extinfo.cgi?type=2&host=galera2.infra.opensuse.org&service=MySQL%20WSREP%20recv 2020-03-06T21:54:22 ok: cluster seems to be back completely 2020-03-06T21:54:37 will need some time to get fully resynced - but this should not harm 2020-03-06T21:54:47 :-) 2020-03-06T21:55:45 have a good night! 2020-03-06T22:05:26 night