One or more devices has experienced an error

Status
Not open for further replies.

ives31

Dabbler
Joined
Oct 17, 2014
Messages
33
  • WARNING: The volume vol1 (ZFS) status is ONLINE: One or more devices has experienced an error resulting in data corruption. Applications may be affected.Restore the file in question if possible. Otherwise restore the entire pool from backup.
Hi,
I've had my Freenas running about 3 days now. I've just about loaded all my data to it (3x2tb hdd) and yesterday I got the above alert.
The gui says my drives are all healthy.It's taken me forever to load all my data from usb 2.0 hard drives. I don't really want to have to start all over again.
Presumably this isn't the drives failing (they were new WD green drives and the whole thing has only been running for about 12 hours all told.
What are my options here?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
You're not running ECC, so it could've been a stray cosmic ray.

Sounds like three striped disks, I don't imagine a RAIDZ1 failing so quickly.

Specs are in signature.
What is zpool status -v?

For those on mobile devices:


Please open a terminal session (WebGUI console, SSH, locally - take your pick) and input what anodos gave you.
Post that output with [ CODE ] tags or in Pastebin! (the formatting can be rather important).
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
This is gonna be messy. :(
 

Fraoch

Patron
Joined
Aug 14, 2014
Messages
395
Presumably this isn't the drives failing (they were new WD green drives and the whole thing has only been running for about 12 hours all told.

It could very well be - hard drives suffer infant mortality for sure. If they're going to fail, they often do so in the first few months. If they survive that, they can run for years and years.

I've had my Freenas running about 3 days now. I've just about loaded all my data to it (3x2tb hdd) and yesterday I got the above alert.
The gui says my drives are all healthy.It's taken me forever to load all my data from usb 2.0 hard drives. I don't really want to have to start all over again.

It's for this very reason (drive infant mortality) that you shouldn't trust FreeNAS with your data just yet. Do lots of tests, stress the drives, and if they survive that, then you can start putting real data on them. Consider the data disposable and don't rely on it until you can be sure the drives are reliable.

Do you have any SMART tests configured? You might want to post the output of:

Code:
sudo smartctl -a /dev/adaX


where X is 0 to 2 (i.e. /dev/ada0, /dev/ada1, /dev/ada2).

Unfortunately since you are not using ECC, any error checking or correction you do from this point is suspect. FreeNAS may be able to correct the errors or it might not, we won't be able to be sure. If the memory has flipped a bit and written it to the disks you won't be able to find it until you try to use that particular file.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Unfortunately since you are not using ECC, any error checking or correction you do from this point is suspect. FreeNAS may be able to correct the errors or it might not, we won't be able to be sure. If the memory has flipped a bit and written it to the disks you won't be able to find it until you try to use that particular file.

Yep. That's why I said "this is gonna be messy". It's also possible his RAM *is* to blame for the problem. Which also means he'd better have backups because "it's about to get messy".
 

Fraoch

Patron
Joined
Aug 14, 2014
Messages
395
Indeed.:(
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
Specs are in signature.
What is zpool status -v?

Did you memtest your system? If you've got a bad dimm you're going to write all kinds of crazy sh@t to your pool. First test your system, then test your drives, then write your data to the pool.
 

ives31

Dabbler
Joined
Oct 17, 2014
Messages
33
ok, this is the result of zpool status -v
Code:
[root@freenas ~]# zpool status -v                                                                                                
  pool: vol1                                                                                                                     
state: ONLINE                                                                                                                   
status: One or more devices has experienced an error resulting in data                                                           
        corruption.  Applications may be affected.                                                                               
action: Restore the file in question if possible.  Otherwise restore the                                                         
        entire pool from backup.                                                                                                 
   see: http://illumos.org/msg/ZFS-8000-8A                                                                                       
  scan: none requested                                                                                                           
config:                                                                                                                          
                                                                                                                                 
        NAME                                            STATE     READ WRITE CKSUM                                               
        vol1                                            ONLINE       0     0     0                                               
          raidz1-0                                      ONLINE       0     0     0                                               
            gptid/0bcf83d0-5baa-11e4-96d2-10c37b4efe84  ONLINE       0     0     0                                               
            gptid/0c9f8ae9-5baa-11e4-96d2-10c37b4efe84  ONLINE       0     0     0                                               
            gptid/0d784f1f-5baa-11e4-96d2-10c37b4efe84  ONLINE       0     0     0                                               
                                                                                                                                 
errors: Permanent errors have been detected in the following files:                                                              
                                                                                                                                 
        vol1/dataset1:<0x7af3>                                                                                                   
[root@freenas ~]#                


I'm wondering if it's possible that the error might be from the data I transferred to the pool rather than the drives or memory?
The reason that I ask is that I kept my data on an old usb hdd that I think is about to die. The reason that i say that is because the data transfer was pitifully slow at some points (kb/s)
 
Last edited:

ives31

Dabbler
Joined
Oct 17, 2014
Messages
33
this is smartctl -a /dev/ada0
Code:
Short self-test routine                                                                                                            
recommended polling time:        (   2) minutes.                                                                                   
Extended self-test routine                                                                                                         
recommended polling time:        ( 265) minutes.                                                                                   
Conveyance self-test routine                                                                                                       
recommended polling time:        (   5) minutes.                                                                                   
SCT capabilities:              (0x7035) SCT Status supported.                                                                      
                                        SCT Feature Control supported.                                                             
                                        SCT Data Table supported.                                                                  
                                                                                                                                   
SMART Attributes Data Structure revision number: 16                                                                                
Vendor Specific SMART Attributes with Thresholds:                                                                                  
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE                                   
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0                                           
  3 Spin_Up_Time            0x0027   167   166   021    Pre-fail  Always       -       4625                                        
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       43                                          
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0                                           
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0                                           
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       85                                          
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0                                           
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0                                           
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       42                                          
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       23                                          
193 Load_Cycle_Count        0x0032   199   199   000    Old_age   Always       -       3100                                        
194 Temperature_Celsius     0x0022   125   120   000    Old_age   Always       -       22                                          
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0                                           
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0                                           
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0                                           
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0                                           
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0                                           
                                                                                                                                   
SMART Error Log Version: 1                                                                                                         
No Errors Logged                                                                                                                   
                                                                                                                                   
SMART Self-test log structure revision number 1                                                                                    
No self-tests have been logged.  [To run self-tests, use: smartctl -t]                                                             
                                                                                                                                   
                                                                                                                                   
SMART Selective self-test log data structure revision number 1                                                                     
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS                                                                                       
    1        0        0  Not_testing                                                                                               
    2        0        0  Not_testing                                                                                               
    3        0        0  Not_testing                                                                                               
    4        0        0  Not_testing                                                                                               
    5        0        0  Not_testing                                                                                               
Selective self-test flags (0x0):                                                                                                   
  After scanning selected spans, do NOT read-scan remainder of disk.                                                               
If Selective self-test is pending on power-up, resume after 0 minute delay.                                                        
                                                                                                                                   
[root@freenas ~]#                                                                                                                  
Paste
132x50     
 

ives31

Dabbler
Joined
Oct 17, 2014
Messages
33
this is smartctl -a /dev/ada1
Code:
Short self-test routine                                                                                                            
recommended polling time:        (   2) minutes.                                                                                   
Extended self-test routine                                                                                                         
recommended polling time:        ( 263) minutes.                                                                                   
Conveyance self-test routine                                                                                                       
recommended polling time:        (   5) minutes.                                                                                   
SCT capabilities:              (0x7035) SCT Status supported.                                                                      
                                        SCT Feature Control supported.                                                             
                                        SCT Data Table supported.                                                                  
                                                                                                                                   
SMART Attributes Data Structure revision number: 16                                                                                
Vendor Specific SMART Attributes with Thresholds:                                                                                  
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE                                   
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0                                           
  3 Spin_Up_Time            0x0027   168   167   021    Pre-fail  Always       -       4566                                        
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       43                                          
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0                                           
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0                                           
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       85                                          
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0                                           
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0                                           
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       42                                          
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       21                                          
193 Load_Cycle_Count        0x0032   199   199   000    Old_age   Always       -       3088                                        
194 Temperature_Celsius     0x0022   125   120   000    Old_age   Always       -       22                                          
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0                                           
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0                                           
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0                                           
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0                                           
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0                                           
                                                                                                                                   
SMART Error Log Version: 1                                                                                                         
No Errors Logged                                                                                                                   
                                                                                                                                   
SMART Self-test log structure revision number 1                                                                                    
No self-tests have been logged.  [To run self-tests, use: smartctl -t]                                                             
                                                                                                                                   
                                                                                                                                   
SMART Selective self-test log data structure revision number 1                                                                     
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS                                                                                       
    1        0        0  Not_testing                                                                                               
    2        0        0  Not_testing                                                                                               
    3        0        0  Not_testing                                                                                               
    4        0        0  Not_testing                                                                                               
    5        0        0  Not_testing                                                                                               
Selective self-test flags (0x0):                                                                                                   
  After scanning selected spans, do NOT read-scan remainder of disk.                                                               
If Selective self-test is pending on power-up, resume after 0 minute delay.                                                        
                                                                                                                                   
[root@freenas ~]#                                       
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Ugg.. looks like it might be a RAM issue.

I'd back up your data now before you do anything else. Keep in mind that data inside /vol1/dataset1 might be corrupt. :/

Do *not* do a scrub of your pool.

I'd definitely make it a very high priority to do a RAM test.
 

ives31

Dabbler
Joined
Oct 17, 2014
Messages
33
this is smartctl -a /dev/ada2
Code:
Short self-test routine                                                                                                            
recommended polling time:        (   2) minutes.                                                                                   
Extended self-test routine                                                                                                         
recommended polling time:        ( 270) minutes.                                                                                   
Conveyance self-test routine                                                                                                       
recommended polling time:        (   5) minutes.                                                                                   
SCT capabilities:              (0x7035) SCT Status supported.                                                                      
                                        SCT Feature Control supported.                                                             
                                        SCT Data Table supported.                                                                  
                                                                                                                                   
SMART Attributes Data Structure revision number: 16                                                                                
Vendor Specific SMART Attributes with Thresholds:                                                                                  
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE                                   
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0                                           
  3 Spin_Up_Time            0x0027   168   167   021    Pre-fail  Always       -       4575                                        
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       43                                          
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0                                           
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0                                           
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       85                                          
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0                                           
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0                                           
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       42                                          
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       20                                          
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       2963                                        
194 Temperature_Celsius     0x0022   125   119   000    Old_age   Always       -       22                                          
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0                                           
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0                                           
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0                                           
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0                                           
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0                                           
                                                                                                                                   
SMART Error Log Version: 1                                                                                                         
No Errors Logged                                                                                                                   
                                                                                                                                   
SMART Self-test log structure revision number 1                                                                                    
No self-tests have been logged.  [To run self-tests, use: smartctl -t]                                                             
                                                                                                                                   
                                                                                                                                   
SMART Selective self-test log data structure revision number 1                                                                     
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS                                                                                       
    1        0        0  Not_testing                                                                                               
    2        0        0  Not_testing                                                                                               
    3        0        0  Not_testing                                                                                               
    4        0        0  Not_testing                                                                                               
    5        0        0  Not_testing                                                                                               
Selective self-test flags (0x0):                                                                                                   
  After scanning selected spans, do NOT read-scan remainder of disk.                                                               
If Selective self-test is pending on power-up, resume after 0 minute delay.                                                        
                                                                                                                                   
[root@freenas ~]#                           
 

ives31

Dabbler
Joined
Oct 17, 2014
Messages
33
Ugg.. looks like it might be a RAM issue.

I'd back up your data now before you do anything else. Keep in mind that data inside /vol1/dataset1 might be corrupt. :/

Do *not* do a scrub of your pool.

I'd definitely make it a very high priority to do a RAM test.
Hi,
Can you explain what a scrub is?
Also, how do I do a RAM test?

thanks
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Hi,
Can you explain what a scrub is?
Also, how do I do a RAM test?

thanks

As soon as the backups are in place, I urge you to do a lot of reading.

Those are basic questions you should not be asking.

For information on scrubs, read Cyberjock's guide (link is in my sig).
For RAM tests, read up on Memtest86+.
 

Starpulkka

Contributor
Joined
Apr 9, 2013
Messages
179
Are im only one who looks green drive hours 85 and load cycle is over 3000... If i would be deep in same laguna i would check zpool history has anyone hdd dropped suddenly.. As for intel hardware stuff, i leave it someone elses mess..
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
Are im only one who looks green drive hours 85 and load cycle is over 3000... If i would be deep in same laguna i would check zpool history has anyone hdd dropped suddenly.. As for intel hardware stuff, i leave it someone elses mess..

You can thank intellipark on those green drives for that... ah, feel the power savings ;)
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Are im only one who looks green drive hours 85 and load cycle is over 3000... If i would be deep in same laguna i would check zpool history has anyone hdd dropped suddenly.. As for intel hardware stuff, i leave it someone elses mess..

Good catch, the Load Cycle Count is way too high for such a new drive and it does need fixing. It is not the first priority, though.
 

ives31

Dabbler
Joined
Oct 17, 2014
Messages
33
so, bearing in mind that I understand about a quarter of what everyone has said in this thread, what's the prognosis here?
Is my data ruined?
what should I do next?
 
Status
Not open for further replies.
Top