The volume VOLUME1 (ZFS) status is UNKNOWN

Status
Not open for further replies.

non-serviam

Cadet
Joined
Jul 20, 2013
Messages
6
My PC:
-----------------------------------------------------------------------------------------------------------------------------------------
Motherboard: Asrock Z77 Pro4-M (8 HDDs connected)
CPU: Intel Core i5 3570K
CPU Cooler: H80i CPU
RAM: 16 GB Corsair Vengeance 1866Mhz
Case: Arc Mini with 3x120mm + 2x140mm fans
SATA Cables: SATA3 Cables from ebay (2 of them are from the Asrock motherboard)
PSU: Corsair HX750
NIC: TP-Link TG-3468 Gigabit PCI Card (my primary NIC)
GPU: Club 3D Radeon HD 7750
Contoller: Trancend SATA II/USB 3 combo (1 ST2000DM001 connected)
HDD: 5x Seagate ST2000DL003 Barracuda Green 3.5-inch 2TB SATA 6 Gb/s Drive (64MB Buffer,5900RPM) and 4x Seagate ST2000DM001 Barracuda 3.5 inch 2TB 7200 RPM 64MB 6GB/S Internal SATA Drive
-----------------------------------------------------------------------------------------------------------------------------------------

My pools keeps giving me errors. I had found this more than a week ago when I first starting using my NAS server with the 9.1.0 having my 9 2TB HDDs on a RAID-Z1. So I run a long SMART test on every drive with Parted Magic that showed that they were all healthy. Reinstalled Freenas 9.1.0, reformatted my HDDs on a 6 drive RAID-z2 and a 3 drive RAID-z1 (with the encryption and initializations option). After I moved many files (CIFS) on the larger pool and I run a scrub it gave me more than 10 errors on each drive (unfortunately I forgot to save the status report). Then I did a zpool clear and after a scrub again and I still got errors:
Code:
[root@freenas ~]# zpool status VOLUME1                                                                                             
  pool: VOLUME1                                                                                                                   
state: ONLINE                                                                                                                     
status: One or more devices has experienced an unrecoverable error.  An                                                           
        attempt was made to correct the error.  Applications are unaffected.                                                       
action: Determine if the device needs to be replaced, and clear the errors                                                         
        using 'zpool clear' or replace the device with 'zpool replace'.                                                           
  see: http://illumos.org/msg/ZFS-8000-9P                                                                                         
  scan: scrub repaired 96K in 0h59m with 0 errors on Fri Aug 30 05:41:29 2013                                                     
config:                                                                                                                           
                                                                                                                                   
        NAME                                                STATE    READ WRITE CKSUM                                             
        VOLUME1                                            ONLINE      0    0    0                                             
          raidz2-0                                          ONLINE      0    0    0                                             
            gptid/3b97d721-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    0                                             
            gptid/3c6095be-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    1                                             
            gptid/3d4226f5-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    1                                             
            gptid/3e1f7628-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    0                                             
            gptid/3f07c106-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    0                                             
            gptid/3feeb788-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    1                                             
                                                                                                                                   
errors: No known data errors  


After moving some more files to the pool:
Code:
  pool: VOLUME1                                                                                                                   
state: ONLINE                                                                                                                     
status: One or more devices has experienced an unrecoverable error.  An                                                           
        attempt was made to correct the error.  Applications are unaffected.                                                       
action: Determine if the device needs to be replaced, and clear the errors                                                         
        using 'zpool clear' or replace the device with 'zpool replace'.                                                           
  see: http://illumos.org/msg/ZFS-8000-9P                                                                                         
  scan: scrub repaired 352K in 1h2m with 0 errors on Fri Aug 30 17:40:11 2013                                                     
config:                                                                                                                           
                                                                                                                                   
        NAME                                                STATE    READ WRITE CKSUM                                             
        VOLUME1                                            ONLINE      0    0    0                                             
          raidz2-0                                          ONLINE      0    0    0                                             
            gptid/3b97d721-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    1                                             
            gptid/3c6095be-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    1                                             
            gptid/3d4226f5-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    3                                             
            gptid/3e1f7628-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    0                                             
            gptid/3f07c106-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    4                                             
            gptid/3feeb788-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    5                                             
                                                                                                                                   
errors: No known data errors  


and then some more files:
Code:
[root@freenas ~]# zpool status VOLUME1                                                                                             
  pool: VOLUME1                                                                                                                   
state: ONLINE                                                                                                                     
status: One or more devices has experienced an unrecoverable error.  An                                                           
        attempt was made to correct the error.  Applications are unaffected.                                                       
action: Determine if the device needs to be replaced, and clear the errors                                                         
        using 'zpool clear' or replace the device with 'zpool replace'.                                                           
  see: http://illumos.org/msg/ZFS-8000-9P                                                                                         
  scan: scrub repaired 256K in 1h4m with 0 errors on Fri Aug 30 23:33:27 2013                                                     
config:                                                                                                                           
                                                                                                                                   
        NAME                                                STATE    READ WRITE CKSUM                                             
        VOLUME1                                            ONLINE      0    0    0                                             
          raidz2-0                                          ONLINE      0    0    0                                             
            gptid/3b97d721-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    3                                             
            gptid/3c6095be-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    4                                             
            gptid/3d4226f5-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    5                                             
            gptid/3e1f7628-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    2                                             
            gptid/3f07c106-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    5                                             
            gptid/3feeb788-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    8                                             
                                                                                                                                   
errors: No known data errors  


So I mounted my other pool (NFS) and while it was empty everything was ok:
Code:
[root@freenas ~]# zpool status VOLUME2                                                                                             
  pool: VOLUME2                                                                                                                   
state: ONLINE                                                                                                                     
  scan: scrub repaired 0 in 0h0m with 0 errors on Fri Aug 30 16:38:03 2013                                                         
config:                                                                                                                           
                                                                                                                                   
        NAME                                                STATE    READ WRITE CKSUM                                             
        VOLUME2                                            ONLINE      0    0    0                                             
          raidz1-0                                          ONLINE      0    0    0                                             
            gptid/5b263d8d-0ca9-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    0                                             
            gptid/5bb13698-0ca9-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    0                                             
            gptid/5c360898-0ca9-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    0                                             
                                                                                                                                   
errors: No known data errors 


but when I moved some files I got more errors:

Code:
  pool: VOLUME2                                                                                                                   
state: ONLINE                                                                                                                     
status: One or more devices has experienced an unrecoverable error.  An                                                           
        attempt was made to correct the error.  Applications are unaffected.                                                       
action: Determine if the device needs to be replaced, and clear the errors                                                         
        using 'zpool clear' or replace the device with 'zpool replace'.                                                           
  see: http://illumos.org/msg/ZFS-8000-9P                                                                                         
  scan: scrub repaired 448K in 0h17m with 0 errors on Sat Aug 31 03:44:12 2013                                                     
config:                                                                                                                           
                                                                                                                                   
        NAME                                                STATE    READ WRITE CKSUM                                             
        VOLUME2                                            ONLINE      0    0    0                                             
          raidz1-0                                          ONLINE      0    0    0                                             
            gptid/5b263d8d-0ca9-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    5                                             
            gptid/5bb13698-0ca9-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    6                                             
            gptid/5c360898-0ca9-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    4                                             
                                                                                                                                   
errors: No known data errors  


Room temperature is about 30-35 Celsius.
Disks temperatures:
Code:
ada8 38C Z1E3DEV5 ST2000DM001-1E6164                                                                                          
ada7 38C Z1E3DT2M ST2000DM001-1E6164                                                                                             
ada6 41C Z1E36BCP ST2000DM001-1E6164                                                                                               
ada5 44C 5YD3E176 ST2000DL003-9VT166                                                                                               
ada4 44C 5YD3E8E5 ST2000DL003-9VT166                                                                                               
ada3 43C 5YD3JLCY ST2000DL003-9VT166                                                                                               
ada2 44C 5YD3E8YS ST2000DL003-9VT166                                                                                               
ada1 43C 5YD3GLGP ST2000DL003-9VT166                                                                                             
ada0 38C Z1E3B1J5 ST2000DM001-1CH164 


CPU temp:
Code:
dev.cpu.0.temperature: 54.0C                                                                                                       
dev.cpu.1.temperature: 59.0C                                                                                                       
dev.cpu.2.temperature: 55.0C                                                                                                       
dev.cpu.3.temperature: 50.0C


The RAM is only recognized as 1600mhz by the motherboard but through the BIOS I "overclocked" to the RAM's stock 1866mhz.
I also checked the sata cables and they appear to be fine and connected properly.

Any suggestions???
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
First, turn off your overclock.. Overclocking a server is just flat out dumb(I'd use more color vocabulary but I think you get the point). I wouldn't be surprised if your overclock was the problem. And since you aren't using ECC RAM, you may have permanently damaged your pool. RAM speed really has no value for FreeNAS. Quantity is so much more important it isn't even a comparison.

Second, your hard drives should be below 40C at all times. That is, if hard drive lifespan is important.

Third, you should probably run a RAM test after you get rid of your overclock. Bad RAM or improperly clocked RAM can damage your pool. The fact that virtually every pool and every drive is having problems means your issue is probably system wide.
 

non-serviam

Cadet
Joined
Jul 20, 2013
Messages
6
First, turn off your overclock.. Overclocking a server is just flat out dumb(I'd use more color vocabulary but I think you get the point). I wouldn't be surprised if your overclock was the problem. And since you aren't using ECC RAM, you may have permanently damaged your pool. RAM speed really has no value for FreeNAS. Quantity is so much more important it isn't even a comparison.

Second, your hard drives should be below 40C at all times. That is, if hard drive lifespan is important.

Third, you should probably run a RAM test after you get rid of your overclock. Bad RAM or improperly clocked RAM can damage your pool. The fact that virtually every pool and every drive is having problems means your issue is probably system wide.


So I did what you told me and set the RAM speed to auto. Then I deleted my second volume, recreated, move some files in it and run a scrub on both volumes aaaaaaaaaaaaand:
Code:
[root@freenas ~]# zpool status                                                                                                 
  pool: VOLUME1                                                                                                               
state: ONLINE                                                                                                                 
  scan: scrub repaired 0 in 1h10m with 0 errors on Sun Sep  1 04:23:19 2013                                                   
config:                                                                                                                       
                                                                                                                               
        NAME                                                STATE    READ WRITE CKSUM                                         
        VOLUME1                                            ONLINE      0    0    0                                         
          raidz2-0                                          ONLINE      0    0    0                                         
            gptid/3b97d721-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    0                                         
            gptid/3c6095be-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    0                                         
            gptid/3d4226f5-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    0                                         
            gptid/3e1f7628-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    0                                         
            gptid/3f07c106-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    0                                         
            gptid/3feeb788-0b4d-11e3-b660-f8d111b562f1.eli  ONLINE      0    0    0                                         
                                                                                                                               
errors: No known data errors                                                                                                   
                                                                                                                               
  pool: VOLUME2                                                                                                               
state: ONLINE                                                                                                                 
  scan: scrub repaired 0 in 0h11m with 0 errors on Sun Sep  1 03:59:44 2013                                                   
config:                                                                                                                       
                                                                                                                               
        NAME                                                STATE    READ WRITE CKSUM                                         
        VOLUME2                                            ONLINE      0    0    0                                         
          raidz1-0                                          ONLINE      0    0    0                                         
            gptid/7b3c8086-1287-11e3-baf1-f8d111b562f1.eli  ONLINE      0    0    0                                         
            gptid/7bcd0e98-1287-11e3-baf1-f8d111b562f1.eli  ONLINE      0    0    0                                         
            gptid/7c523a16-1287-11e3-baf1-f8d111b562f1.eli  ONLINE      0    0    0                                         
                                                                                                                               
errors: No known data errors    


WHAT THE HELL!!!!! How is this possible? Not only the new volume didn't had any errors but the errors from the old one had disappeared!!! Also the motherboard set the speed of the RAM to 1333mhz while it supports 1600mhz. And finally how is it possible for all this to happen when the speed of my RAM IS 1866mhz!

Thanks by the way. :D


//edit: How do I run a memory test?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
So let's make some assumptions real fast... You changed your RAM speed to Auto, and the problem appears to have gone away.

So here's what is probably going on now:

Your data on the disk may or may not have been correct, but because of the improper RAM clocks your data in RAM was being corrupted. This does 2 things in RAM; it corrupts data that is good, and corrupts data that is bad. More than likely, you saw data that was good as bad, so it then "repaired" it, making it permanently bad. Because of how the checksumming works you now have corrupted files that you might never be able to prove are bad. Now you know why ECC RAM is pretty much needed for ZFS.

Unfortunately there's no way to identify what is good and what isn't because your RAM was effectively unreliable during the time it wasn't at auto. Your pools are appearing to be healthy, but they may not be. And someday you may find that your pool has serious metadata corruption that you can't identify right now, but you will find out about later when your pool won't mount. :(

So whether you want to rebuild your pools from scratch or not is your choice, your data and your risk.
 
Status
Not open for further replies.
Top