ZFS Pool Degraded (too many errors)

Status
Not open for further replies.

Stekelenburg

Dabbler
Joined
Jul 3, 2013
Messages
14
Dear all,

I recently replaced upgraded my FreeNAS server by replacing my old hardware in installing the latest version of FreeNAS (9.3):
- replaced old 2 x 4Gb by 2 new 8Gb RAM
- replaced old HDD's by one brand new WD RED 3Tb

Right after everything was up-and-running, I started receiving critical error messages; the status of the ZFS volume is DEGRADED, one or more devices has experienced an error resulting in data corruption, applications may be affected.

After first trying to solve this by manually replacing the effected files, not long after I received the same error messages again and the total number of errors is increasing ever since, causing a lot of damaged files.

My question now is if anyone can tell me, what causes this problem? Is it most likely a hardware issue caused by my new brand new WD RED 3Tb HDD, which might be defect (eg. bad sectors) and should I go back to the store, to ask for a new one or could this also be caused by a my RAM memory or just something totally different!?

Thanks in advance and your help will be much appreciated.

Systeeminformatie.png

ZFS Pool Degraded.png
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Did you do any kind of burn-in or testing on the new disks before installing them in your server? It's certainly possible that one or both of them is DOA. The fact that you're seeing errors with both disks, though, makes me suspicious that something else is going on. Check your data and power cable connections, and try replacing your SATA cables if you have any spares. Also, post the output of 'smartctl -a /dev/ada*' for each of your disks.

What's the rest of your hardware, specifically your motherboard? How are your drives attached?

You have set up your pool with no redundancy, so any errors will result in data loss. If you have backups, you're going to need them. If not, start backing up your data as quickly as you can.
 

Stekelenburg

Dabbler
Joined
Jul 3, 2013
Messages
14
Little note; I currently have two HDD's in my FreeNAS, but problem immediately started after removing my old HDD's, adding a brand new WD RED 3TB HDD and expanding the RAM memory (replaced old 2 x 4 GB by brand new 2 x 8 Gb RAM modules). Added the second HDD (Hitachi 1Tb HDD) later on, just to find out if this would have any effect on getting those error message (which of course it didn't).

- No, didn't do any burn-in or testing before installing new HDD (the second 1Tb HDD was already in use in my old setup and never gave any issues)
- Will start check the power and data cables and replace both SATA cables (do have some spares ones) later on today
- Please see 'smartctl' output below

Motherboard: Intel Desktop Board DB75EN
Processor: Intel Pentium G2020 (55W)
RAM: 2 x Corsair Venegeance 8Gb DDR3-1600 (CMZ16GX3M2A1600C9)

NOTE - Besides from replacing my old HDD's by one new one and replacing the RAM, no further hardware changes (knowing I didn't have any issues at the time of my former configuration)

smartctl.png


** Both drives are connected to the motherboard via a serial SATA cable **

Thanks !!!
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
OK, I wasn't specific on my smartctl command--what I intended was for you to replace ada* with each individual device name. In your case, you should run 'smartctl -a /dev/ada0' and 'smartctl -a /dev/ada1'. The output will scroll off the screen, so to capture it all, you'll probably want to use an ssh client to connect to your server (rather than the shell in the web GUI).

You have a desktop board and non-ECC RAM, and you recently changed the RAM, which makes it one suspect. A few passes through memtest86+ would help clear up whether your RAM is a problem.
 

Stekelenburg

Dabbler
Joined
Jul 3, 2013
Messages
14
Please find the output of smartctl for both ada0 & ada1 in the file attached and let me know what you think pls.
Following up on your remark, the new RAM modules could also be the cause of the problem, let me double check (not 100 percent sure I entered the correct product code/type number). Furthermore you mention a few passes through memtest86+ would help clear up, can you be more specific (is this a tool/cmd I can run in the FreeNAS GUI and if so, could you pls provide the exact command?).
Thanks.
 

Attachments

  • smartctl ada0 ada1.pdf
    37.6 KB · Views: 483

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
When troubleshooting, a logical place to start is any recent changes you made. You recently installed new RAM, and bad RAM definitely can cause problems with ZFS, so testing that would be a good step. Memtest86+ is a popular, free memory test utility that can be downloaded from its own site, and is often included with Linux distributions and utility CDs. I've seen some suggestions to include it as a boot option in FreeNAS or on the installer ISO, but it isn't there at this point. So, you'll need to download either Memtest86+ itself, or something like the Ultimate Boot CD which includes it (along with a bunch of other utilities), and boot into it. I'd suggest letting it run for at least 24 hours unless it starts spitting out errors before that.

Your SMART data looks pretty decent. ada0 has never had a SMART test run, so you should definitely schedule those immediately. In the meantime, you can kick off a long test by running 'smartctl -t long /dev/ada0'. But no errors are apparent, and the temperature is reasonable. There are no significant errors on ada1 either, and the temperature is currently reasonable, though it has a history of having been way too hot (you can literally cook an egg with less than 65C). It does show SMART tests as having been run, though not on a very regular basis, and with too much time between tests.
 
Status
Not open for further replies.
Top