Multiple degraded drives at once & several unwanted reboots

Rayxcer

Cadet
Joined
Jun 28, 2020
Messages
6
Hello fellow freeNAS-enthusiasts,
I've got a problem with my installation;
at first, it started to reboot once per day on its own, which I couldn't find any cause of (I'm not very good with freebsd[-based os'], tho).

Today I saw that not only it rebooted twice today, but also that (according to freeNAS) 6 out of 9 drives are marked as degraded.
While it is certainly possible for multiple drive failures to occur at once, it seems unlikely to me and furthermore the data (seems) to be alright (according to my random picking of files to test).

Hardware:
AMD R7 2700X
Asus Prime X370 Pro
64GB 3000MHz RAM
9x 4TB Seagate HDDs
Zotac GeForce GT 710


Thanks a lot in advance.


PS: I know that I certainly missed some information or logs that you guys will need to help me, so please let me know what you need and I'll add it as soon as possible
PS2: My english is horrible and I'm very sorry for that (I'm working on it). But please ignore that for now.
 

Attachments

  • 1.png
    1.png
    69.3 KB · Views: 236
  • 2.png
    2.png
    55.3 KB · Views: 251

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Are you auto-overclocking this gaming motherboard? This is a very bad idea with FreeNAS.
 

Rayxcer

Cadet
Joined
Jun 28, 2020
Messages
6
@Samuel Tai Yes, sadly. I knew there was something I forgot when I switched that board a couple of weeks ago to RMA the original one...

I deactivated it now (I hope. X370-Bios' aren't exactly userfriendly or good). But this surely isn't the (only) cause of my issues, is it?
I knew I should deactivate it but I honestly didn't know specifically why.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
With overclocking, you're pushing RAM and PCI beyond their rated speed. Motherboard traces also radiate RF outside the motherboard's tolerances, increasing data line crosstalk. Basically, overclocking means you're making a conscious tradeoff between processing speed and data integrity. For a 3D RPG, that usually means a corrupt pixel here or there. For a storage application, where you want every bit to be exactly transported and stored, it's a recipe for disaster. Your pool is in pretty bad shape, and you may need to destroy it and recreate it, restoring from backup afterwards to get it back to a clean state.
 
Last edited:

Rayxcer

Cadet
Joined
Jun 28, 2020
Messages
6
@Samuel Tai I see what you mean, yeah.

But just for my sanity:
Are you (within reasonable margin of error, of course) sure, that my drives are not physically corrupted? (I am aware that it is basically impossible for you to know with 100% certainty, but a tendency would be nice.)

Since it's basically 80% Blu Ray-Rips (the original discs I have all stored in a closed, but for Plex and convenience-purpose they're all on my NAS) I'd have to go through the painstakingly long process of ripping them again when I erase the whole storage pool.
 

Rayxcer

Cadet
Joined
Jun 28, 2020
Messages
6
Sorry for double-posting,
but there is a new development.
I tried to swap the RMA-ed board (Asus Prime X470 Pro) and now the system is freaking out like crazy.
But if I'm not mistaken, it seems that it has something to do with my cpu rather then my board, storage, ram or anything else (of course, feel free to tell me I'm wrong, since I'm really not good with FreeBSD-based OS').
Error.JPG
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
page not present means the CPU tried to read from memory, and the RAM couldn’t present that page. That’s a memory error.
 

Rayxcer

Cadet
Joined
Jun 28, 2020
Messages
6
So my RAM is faulty?
Well, that really annoying. I've got 4 new sticks in there, which came as two sets.
I will go and try to get the faulty one(s) then.

But that would explain the errors and reboots as well as the degraded pools.

Thanks a lot @Samuel Tai. :)
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
There is no way to ensure the data is good or not. It is very possible that you corrupted your data before ZFS saves it to the drive. As such, whatever is on the drives is corrupted data, corrupted from memory.

You gambled your data with your gaming NAS. So far, we know that you lost certainty and security. How much of integrity you loss, there is no easy way to find. What you should do now is empty that pool and rebuilding it, using a professional NAS instead of a gaming one.
 

subhuman

Contributor
Joined
Nov 21, 2019
Messages
121
So my RAM is faulty?
Zen+ max memory speed is 2933. Anything more is an overclock. If you are running your RAM at its rated speed of 3000, then you are overclocking the memory controller.
 
Top