2026-06-15T19:02:57 synapse repeatedly runs into oom killer since some days 2026-06-15T19:11:09 i see nothing about that in journalctl 2026-06-15T19:15:37 last one is Jun 15 18:53:25 matrix kernel: Out of memory: Killed process 3245146 (python3) 2026-06-15T19:16:21 you can filter to kernel with journalctl -k if too much output 2026-06-15T19:17:07 running btop now 2026-06-15T19:17:08 can confirm that memory usage is all out of whack 2026-06-15T19:17:09 it just locked up on me lol 2026-06-15T19:18:19 heh my first htop was a coredump upon trying to sort 2026-06-15T19:18:40 btop seems to work when the system doesnt run out of memory, a synapse process just OOMed giving me another 4.5G of ram 2026-06-15T19:18:47 https://paste.opensuse.org/pastes/2e7807b8adf0 if you're able to see private posts 2026-06-15T19:19:05 it was at ~250M free when it froze 2026-06-15T19:19:23 (also had 2 cores pinned) 2026-06-15T19:20:08 there's also https://monitor.opensuse.org/grafana/goto/CNzlWDavR?orgId=1, cannot quite make out a trend 2026-06-15T19:21:29 hm, it seems to correlate with the memory graph under process info 2026-06-15T19:21:38 noticing that the GC goes absolute haywire 2026-06-15T19:39:24 hm, having a chat about this with a friend, their suggestion seems to be to try _increasing_ the in-memory caches 2026-06-15T19:39:25 theory being that its OOMing due to requests getting backlogged, causing extra mem bloat and then dying 2026-06-15T19:39:27 also, another OOM incomming, btop froze showing 188MiB available 2026-06-15T19:40:49 okay, seems to _particularly_ be "synapse_federation_request 1" that keeps OOMing 2026-06-15T19:45:58 acidsys: hm, how would i find the reverse proxy logs? 2026-06-15T19:46:10 im logged into logger-prg but im not sure what host im looking for 2026-06-15T19:50:06 maybe a better idea to hop on irc for once :D 2026-06-15T19:54:03 acidsys: how would i go about finding the haproxy logs? just gotta rule out some foul play :) 2026-06-15T20:33:04 what a rare sight, rorysys :P 2026-06-15T20:33:24 check /var/log/remote/atlas{1,2}/haproxy 2026-06-15T20:36:41 yeah i figured registering a libera account might help given the bridge relies on synapse :D 2026-06-15T20:37:11 good thinking 2026-06-15T20:49:34 hm 2026-06-15T20:50:03 wish the logs were more verbose (particularly, including the requester user agent), but im seeing some very particular and interesting patterns that match a known bug in a non-Element implementation 2026-06-15T20:51:11 acidsys: how much work is it to block 2 IP addresses on the reverse proxy for a week or so to see if the issue goes away? 2026-06-15T21:01:10 rorysys: good point regarding useragent missing in haproxy logs, but you can find the information in nginx, unfortunately not yet in logger, but on matrix.i.o.o itself: /var/log/nginx/matrix.access.log.gz (or one of the dated archives in the same directory) 2026-06-15T21:03:20 rorysys: work effort near zero 2026-06-15T21:04:26 I suggest to reject with http 429 if you find them to send abnormal amounts of requests 2026-06-15T21:08:48 iirc the implementation doesnt honor 429's on that endpoint... 2026-06-15T21:09:51 semi un-/related, im doing some unplanned maintenance: purging and blocking #community:matrix.org (old useless room, 2 local members) - its one of the affected rooms and i cant presume it's in good health 2026-06-15T21:10:02 so we may see elevated memory usage on main for a bit 2026-06-15T21:13:13 that room is on the scale of old matrix HQ and not a good idea to keep around :) 2026-06-15T21:13:26 stopped draupnir-bot to give us a small amount more memory headroom just incase 2026-06-15T21:13:55 there is not really anything to honor with a http response, how you handle it does not change what the server returns 2026-06-15T21:14:24 sounds good 2026-06-15T21:14:28 it doesnt but it just keeps retrying then - the implementation is known to be broken and spamming requests regardless 2026-06-15T21:14:43 anyways, purging the matrix community space should give more headroom at least by not filling memory with junk 2026-06-15T21:15:10 ah right, but those would be caught on the reverse proxy and not sent to the right 2026-06-15T21:15:15 to the backend* 2026-06-15T21:17:50 we can of course also block it on a network level, but I prefer not to, as it does not give any indication to the client what the problem is. and it's not really necessary unless they were to target multiple protocols. generating reject responses is relatively cheap in HAProxy 2026-06-15T21:20:10 true 2026-06-15T21:20:28 semirelated, found an old libera room with no local members :) 2026-06-15T21:21:25 ah, an old ubuntu room i can purge :) 2026-06-15T21:41:39 purging the old neovim room too since it was tombstoned, smaller database always good :D 2026-06-15T22:18:02 comraderachel: oh hi lol 2026-06-15T22:23:21 lol hi 2026-06-15T22:31:41 acidsys: looks like memory usage is down by 5 gigs? 2026-06-15T22:32:04 i did restart federation_requests{1,2} an hour or so ago and im still purging tombstoned rooms 2026-06-15T22:34:26 neat