OS Version: TrueNAS-SCALE-22.02.4
Motherboard: Supermicro X10SL7-F
CPU: Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz
RAM: 4x8GB Crutial DDR3 ECC RAM 1600MHz
Controller Hardware:
Motherboard: Supermicro X10SL7-F
CPU: Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz
RAM: 4x8GB Crutial DDR3 ECC RAM 1600MHz
Controller Hardware:
8x SAS2 (6Gbps) via Broadcom 2308 flashed into IT mode, 2 SATA (6Gbps), 4 SATA (3Gbps)
HDD Hardware/Layout:5x3TB SAS HGST HUS724030ALS640 (plugged into the SAS2 ports)
3x3TB SAS WD Enterprise-class WD3001FYYG (plugged into the SAS2 ports)
2x128GB PNY SSD's for boot (plugged into the 2x SATA 6Gbps ports)
I had 10x HGST ones in total so 2 spare for replacement.
The WD HDD's was out of an old server I had and kept for temp backup hdd replacements (10 in total).
All drives was tested and fine before put to use.
RAID: RAIDZ2 using all 8x3TB SAS HDD's in a pool, boot ssd's are just mirroredSo this morning I had an email notification from my server telling me 2 hdd have faulted. One was practically dead and the other had like 1k write errors. The one thing that confused me was, this happened on 12th december but I had only just got the email so I was a bit unsure what happened there.
Anyhow, I did the usual procedure of replacing an hdd. The first hdd replacement went completely fine and only took around an hour or so to resilver. I then got around to doing the 2nd one and had to go to work. Once I got home I checked on it and to my surprise, it was still going with an INSANE amount of errors. I've never seen anything remotely as bad! My heart just dropped. My pool was completely gone and I legit don't know where to go from here. I can't even use the shell because it just hangs when first accessing it.
Sadly I'm presuming I've lost all or if not most of the data (roughly 6TB). I'm expecting either there's been a power issue or the HBA likely somehow crapped itself on the 2nd resilver.
The resilvering is still going and all HDD's now show as degraded aside from the 2 replaced ones being online, and another 1 being faulted. (see images)
I do have a backup on another pool (external hdd) that I plug in and import once every 1-2 week or so for my important data like password vaults and such. I wasn't able to do a full replication due to being limited via ports and how much storage I needed so I opted just to do the important data on an ext hdd till I got another server capable.
I have to be the unluckiest guy in the world though. I literally ordered a new 2nd-hand server from ebay yesterday (poweredge t430, really good price) to treat myself a bit towards christmas and was planning on doing a full migration into it. Replacing all drives one by one with some new ones in a couple days to then use this current server as a backup server.
Okay back to the point- I've practically not had any major issue since running this server (3 years) aside from the odd hdd starting to get errors and then be replaced.
My main question is, where would you go from here if you was in my shoes and what would be the best practice to MAYBE salvaging some data?
Please let me know if I have missed some information in the specs or otherwise. This is my first post and I'll happily provide you with what I can if needed (and if I'm able to). Thank you.