(ZFS) status is UNKNOWN (data corruption)

Status
Not open for further replies.

ktr

Cadet
Joined
Dec 13, 2013
Messages
5
Hi - I recently started using FreeNAS (~ 3mos ago) and when I logged into today I noticed the error:

Code:
WARNING: The volume ryans_hds (ZFS) status is UNKNOWN: One or more devices has experienced an error resulting in data corruption. Applications may be affected.Restore the file in question if possible. Otherwise restore the entire pool from backup.


After searching a bit, I came across http://forums.freenas.org/threads/zfs-status-unknown.16104/, but it didn't seem to help me much. I ran zpool status -x and got:

Code:
[root@freenas ~]# zpool status -x                                             
  pool: ryans_hds                                                             
state: DEGRADED                                                               
status: One or more devices has experienced an error resulting in data         
        corruption.  Applications may be affected.                             
action: Restore the file in question if possible.  Otherwise restore the       
        entire pool from backup.                                               
  see: http://illumos.org/msg/ZFS-8000-8A                                     
  scan: scrub repaired 0 in 13h35m with 1227 errors on Sun Dec  8 13:35:57 2013
config:                                                                       
                                                                               
        NAME                                            STATE    READ WRITE CKS UM                                                                             
        ryans_hds                                      DEGRADED    0    0    0                                                                             
          mirror-0                                      DEGRADED    0    0    0                                                                             
            gptid/ecc7beeb-0172-11e3-b2a9-d43d7eb33430  ONLINE      0    0    0                                                                             
            15311977958075100947                        UNAVAIL      0    0    0  was /dev/gptid/ed3a1e65-0172-11e3-b2a9-d43d7eb33430                       
                                                                               
errors: 1227 data errors, use '-v' for a list


So it appears to me that something happened to one of my devices and it is now "unavailable"? Is there a way to fix this error? The drives are also only ~ 3mos old (I bought them specifically to build this box).

Also, regarding the actual files that were corrupted ... how do I go about trying to restore the "file in question"? Should I just do
Code:
zfs rollback <file in question>
?

Sorry if these are basic questions - thanks in advance!
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Sorry, I cannot give advice on how to treat your pool but backup any files you want to retain, assuming you can access the files.

Next setup email on FreeNAS to send you daily status messages. The scrub happened on 8 Dec, it's 13 Dec, you should know about serious problems a bit quicker.

Do you have your system on an UPS? I'm curious as to why you have this type of failure, is it just a hard drive failure?

At the SSH shell could you report back what the following info (assuming your drives are ada0 and ada1), looking for a drive failure issue:
smartctl -A /dev/ada0
smartctl -A /dev/ada1
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
Can you also please please post output of these commands (in CODE tags):
camcontrol devlist
gpart show
 

ktr

Cadet
Joined
Dec 13, 2013
Messages
5
I will get the output of these commands tonight when I get home (don't have access from work). @joeschmuck: I am not using a UPS - is that a huge mistake? Thanks!
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Yes, you truly need an UPS that has a USB status cable. This will for one keep your NAS from corrupting its data if you have a power failure and it will automatically shutdown the NAS for a prolonged power outage. Of course the alternative is no UPS and you could have all kinds of data corruption and even mess up your boot Flash drive image. It sucks recovering from this kind of failure because it could take from hours to days. Also my first thought was you had a power glitch which might have caused your problem, but then again your problem may have nothing to do with power.

One other thought is the SATA cables. You might need to replace the one going to the failed drive. Many folks have SATA cable issues and since your system is new, that could be it. Of course let's see what those commands tell us that we asked for.
 

ktr

Cadet
Joined
Dec 13, 2013
Messages
5
Ok, back again. Here are the results:

Code:
[root@freenas ~]# smartctl /dev/ada0                                        
smartctl 6.1 2013-03-16 r3800 [FreeBSD 9.1-STABLE amd64] (local build)       
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org 
                                                                             
ATA device successfully opened
 
[root@freenas ~]# smartctl /dev/ada1                                         
smartctl 6.1 2013-03-16 r3800 [FreeBSD 9.1-STABLE amd64] (local build)       
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org 
                                                                             
ATA device successfully opened
 
[root@freenas ~]# camcontrol devlist
<WDC WD20EFRX-68AX9N0 80.00A80> at scbus0 target 0 lun 0 (ada0,pass0)
<HITACHI HTS723216L9SA60 FC2ZC50B> at scbus4 target 1 lun 0 (ada1,pass1)
<SanDisk Cruzer Fit 2.01> at scbus6 target 0 lun 0 (pass2,da0)
 
[root@freenas ~]# gpart show
=> 63 7821249 da0 MBR (3.7G)
63 1930257 1 freebsd [active] (942M)
1930320 63 - free - (31k)
1930383 1930257 2 freebsd (942M)
3860640 3024 3 freebsd (1.5M)
3863664 41328 4 freebsd (20M)
3904992 3916320 - free - (1.9G)
 
=> 0 1930257 da0s1 BSD (942M)
0 16 - free - (8.0k)
16 1930241 1 !0 (942M)
 
=> 34 3907029101 ada0 GPT (1.8T)
34 94 - free - (47k)
128 4194304 1 freebsd-swap (2.0G)
4194432 3902834696 2 freebsd-zfs (1.8T)
3907029128 7 - free - (3.5k)
 
=> 63 312581745 ada1 MBR (149G)
63 301417137 1 ntfs [active] (143G)
301417200 1296 - free - (648k)
301418496 11159552 2 !18 (5.3G)
312578048 3760 - free - (1.9M)


Regarding a power issue, we do have outages every now and then - somewhat of a rural area (not rural rural, but moreso than a lot of places).

So, any ideas given the output above? Also, does this mean that my disk is hosed or just that the "scrub" failed (and assuming it just means it failed, am I able to recover)? Thanks for all your time so far - I'm ordering an uninterruptible power supply this weekend as a result!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Your smartctl commands are wrong.. you forgot the "-a". What you provided is useless. :P
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Dude.

The second drive listed is a windows drive. It's got an NTFS partition, and a MBR. So I have no idea WTF you're doing.
 

ktr

Cadet
Joined
Dec 13, 2013
Messages
5
Sorry, the NTFS drive is unrelated to this issue ... it's from a laptop that I needed to get the data off of it. It's been a long day :/ Let's try again:

Code:
[root@freenas ~]# smartctl -A /dev/ada0
smartctl 6.1 2013-03-16 r3800 [FreeBSD 9.1-STABLE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x002f  001  001  051    Pre-fail  Always  FAILING_NOW 48769
  3 Spin_Up_Time            0x0027  182  177  021    Pre-fail  Always      -      3891
  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      18
  5 Reallocated_Sector_Ct  0x0033  179  179  140    Pre-fail  Always      -      639
  7 Seek_Error_Rate        0x002e  200  190  000    Old_age  Always      -      0
  9 Power_On_Hours          0x0032  097  097  000    Old_age  Always      -      2598
10 Spin_Retry_Count        0x0032  100  253  000    Old_age  Always      -      0
11 Calibration_Retry_Count 0x0032  100  253  000    Old_age  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      18
192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      14
193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always      -      3
194 Temperature_Celsius    0x0022  119  115  000    Old_age  Always      -      28
196 Reallocated_Event_Count 0x0032  187  187  000    Old_age  Always      -      13
197 Current_Pending_Sector  0x0032  197  197  000    Old_age  Always      -      1180
198 Offline_Uncorrectable  0x0030  100  253  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always      -      0
200 Multi_Zone_Error_Rate  0x0008  100  253  000    Old_age  Offline      -      0
 
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
and .. ada0 fail! CUPS=1180 is a bad thing.

So copy your data off of ada0 and hope you get the important stuff. If you have backups, then you should be fine as you need to use them. But you will need to recreate your pool from scratch.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Just to add to what cyberjock said:

The correct number of Current Pending Sectors is 0. Any number above 0 is what is known as "very bad".

You can see your number is 1180. This means your drive is "hosed" and cannot be salvaged for use. Get what you can off of it, and get a new drive.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Also, you failed to setup emailing in FreeNAS, failed to setup SMART service and turn it on, or both. If you had you'd have gotten warnings LONG before you got the streetlight warning like you did.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
And you should also do a "smartctl -A /dev/ada1" so we can see if you have another drive about to go, hopefully not but you never know.
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
I don't care much about ada1 smartctl output as that is an unrelated NTFS drive. What I'd like to know is what happened to the other drive from the mirror. From the information provided so far I see we have a mirror with one drive missing and the second drive has unrecoverable errors resulting in corrupted files. However, if the other drive just disconnected due to faulty power or SATA cable, it can still hold good copy of the data. @ktr, you need to figure out what happened to that other drive. The OS does not see it (camcontrol).
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Eh.. If the other drive was disconnected you should make sure that the currently installed 1/2 of the mirror is disconnected. Basically treat each disk like a separate pool that should never be connected at the same time.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Yea, I didn't catch that the NTFS was now the ada1 drive and that the other drive must be disconnected since it doesn't show up. My bad.
 

ktr

Cadet
Joined
Dec 13, 2013
Messages
5
I can't seem to get the other disk to be show up in FreeNAS ... tried switching sata cables, switching power input, changing which port I plug it into ... it never shows up. Does that mean the 2nd disk is dead? I'm wondering how this all could have come about - I literally just built everything 3mos ago and thought I was generally pretty safe by having the two disks mirror each other. Now it seems that one is bad / going bad and I can't even see the other one. I'm also a little nervous as we have all our family videos on the one remaining disk :(

Any suggestions would be most welcome. Thanks again for everything so far.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
There is a thing called infant mortality where things die well before their designed end of life. Most of the time these things die rather quickly but for some reason yours hung in there a bit longer. Of course we are making an assumption that the drives are the cause and not the motherboard or possibly the power supply. To verify your drives are the issue you could place them into another computer and test them out with disk testing software. I would disconnect any drives to that machine that you want to keep in-tact, it's easy to delete a drive by accident. This is just something you could try if you want, I know I would just to ensure I get all the facts before filling out an RMA.
 
Status
Not open for further replies.
Top