Well hello again, I have just learned that the host that recently had both nvme drives fail upon drive replacement, now has new problems: the filesystem report permanent data errors affecting the database of both, Matrix server and Telegram bridge.
I have just rented a new machine and am about to restore the database snapshot of the 26. of july, just in case. All the troubleshooting the recent days was very exhausting, however, i will try to do or at least prepare this within the upcoming hours.
Update
After a rescan the errors have gone away, however the drives logged errors too. It’s now the question as to whether the data integrety should be trusted.
Status august 1st
Well … good question… optimizations have been made last night, the restore was successful and … we are back to debugging outgoing federation :(
The new hardware also will be a bit more powerful… and yes, i have not forgotten that i wanted to update that database. It’s just that i was busy debugging federation problems.
References
- federation issues after restore: https://github.com/matrix-org/synapse/issues/16025
- why we had to restore initially: https://text.tchncs.de/tchncs/about-the-matrix-incident-on-july-26-2023
I am a bit confused now… the spare was 98% as to read in my snippet above … where does it say “no spare available”? I think it is on me to request a swap, and thats what i did as also the one with slightly less wear reported 255% used – which afaik is an aprox. lifetime left estimation based on rw cycles (not sure about all factors).
The one the hoster left in for me to play with, said no:
[Wed Jul 26 19:19:10 2023] nvme nvme1: I/O 9 QID 0 timeout, disable controller [Wed Jul 26 19:19:10 2023] nvme nvme1: Device shutdown incomplete; abort shutdown [Wed Jul 26 19:19:10 2023] nvme nvme1: Removing after probe failure status: -4
Tried multiple kernelflags n stuff but couldn’t get past that error. Would have been interesting to have the hoster ship the thing to me (and maybe that would have been a long enough cooldown to have the thing working again), but i assume that would have been expensive from helsinki.
My bad. I must have misread. Sorry.
Yes, shipping it to you would have probably been a good idea. Does it cost a lot less to use the helsinki location? Otherwise Falkenstein would be a pretty good alternative I guess.