WARNING: The volume volume1 (ZFS) status is UNKNOWN

Status
Not open for further replies.

MadSaid

Dabbler
Joined
Aug 30, 2012
Messages
12
Hi All,

When I go to the web interface I see a yellow flashing light. When I click on it I get message below:

WARNING: The volume volume1 (ZFS) status is UNKNOWN: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'.

I have searched the forum and found a few threats, including this one: http://forums.freenas.org/showthread.php?9306-ZFS-Status-Unknown&highlight=volume+volume1+%28ZFS%29+status+UNKNOWN

The 1st link in the threat is not working. Anyway, can someone please help me.

Additional info:
This is my 1st freenas / linux experience. I have version FreeNAS-8.3.0-RELEASE-x64 (r12701M) installed.
I sometimes get an error when I try to rename a folder on the nas: folder in use by other program (this is not the case btw).

thanks in advance,

MadSaid
 

MadSaid

Dabbler
Joined
Aug 30, 2012
Messages
12
I don't have this issue any more. strange.... The behaviour is not consitent. Is there a way to capture / check the system / boot log file. I am not at ease with this.

thanks in advance,

MadSaid
 

MadSaid

Dabbler
Joined
Aug 30, 2012
Messages
12
I have the error again. See below result of "zpool status":

Code:
pool: volume1
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
  scan: none requested
config:

        NAME                                            STATE     READ WRITE CKSUM
        volume1                                         ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            gptid/315fdd47-342e-11e2-8df0-e840f205bc31  ONLINE       0     0     0
            gptid/321ad0bf-342e-11e2-8df0-e840f205bc31  ONLINE       0     0     5
            gptid/329235e7-342e-11e2-8df0-e840f205bc31  ONLINE       0     0     0

errors: No known data errors


I fixed the checksum error with "zpool clear volume1".

How can I determine and fix the root cause of this? E.g. Determine if the device needs to be replaced?
 

warri

Guru
Joined
Jun 6, 2011
Messages
1,193
Run a smart test on the device in question. Also if errors consistently turn up on one device, this is a good indicator of a bad drive.
 

Stephens

Patron
Joined
Jun 19, 2012
Messages
496
CHKDSK c:\
{blahblahblah}
{errors, segments, filenames, unmapped extents, etc}
"hmm"
CLS
"ah, that's better"

OK, now that I've had my moment, my ideas...

- If your data is critical, make sure you have a backup.
- Stop clearing the error counters so we can see which drives are giving you problems. Repost results of zpool status after you've had problems.
- Make sure drive cables are not broken and are seated properly. They are a big cause of CRC errors.
- Run a SMART long test against the drive(s) in question. (You can google the commands/programs to do that).
- I believe that thread doesn't address your issue. I only quickly looked and it seems to be related to people who upgrade to 8.3 who previously used a lower version of ZFS. The system then would essentially give a yellow-alert warning that it wasn't using the latest ZFS version (28). It's just a link to the README for that version, which I believe is also here.
 

MadSaid

Dabbler
Joined
Aug 30, 2012
Messages
12
- If your data is critical, make sure you have a backup.
- Stop clearing the error counters so we can see which drives are giving you problems. Repost results of zpool status after you've had problems.
- Make sure drive cables are not broken and are seated properly. They are a big cause of CRC errors.
- Run a SMART long test against the drive(s) in question. (You can google the commands/programs to do that).

- backup -> done
- cables checked and re-checked, seems OK. I can try to switch cables between drives and check if error occurs on other drive.
- done, please see the result of the 3 tests here. View attachment smart test ada0.txt View attachment smart test ada1.txt View attachment smart test ada2.txt
It seems that ada1 had the issues, but the long test has passed without errors. What is the conclusion of the health of the disk?

I will run memtest today, just to be sure about the memory.

Suggestions how to proceeed are welcome.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Few comments in case it helps:

- Running zpool clear doesn't fix anything. All it does is clear the error count, which gives the impression nothing is wrong again. This would normally be done only after the admin has fixed what he thinks is the problem and wants to see if the errors are still occurring.
- How often do you run scrubs? If you dont, you should read the manual.. cover to cover. If you aren't running scrubs I know you didn't read the manual :P
- You may want to do a scrub(but not yet). If the second hard drive with the errors is going bad, I can almost guarantee you that you'll rack up boatloads of errors(assuming it can even finish). I would do the memtest first, but I'm not expecting you'll find any errors.
- As Stephens said, if you have no backup, better think about doing one. Since you have a RAIDZ1 if the one disk with the errors fails and you have even 1 sector go bad on another disk you will see data corruption, anything from 1 file corrupted to loss of large amounts of data lost. That's why RAIDZ2 is strongly recommended over RAIDZ1.
- At the end of the ada1 log is the most recent 5 errors, I'm not familiar with the error, but someone else said that he had those errors when his BIOS settings for the SATA controller were set to IDE. Per the manual they should be set to AHCI. I would check your BIOS and see if it is set to AHCI. If it is not, I would set it and see what happens. If it is set to IDE and you change it to AHCI monitor the zpool status for more errors. If they appear please provide a new smart output for the applicable drive(or all drives if you wish) so we can see if the error has changed.

Also, please provide the hardware you are using. Some people have problems with certain low quality SATA controllers and I'd like to see what you are plugging into.
 

MadSaid

Dabbler
Joined
Aug 30, 2012
Messages
12
- How often do you run scrubs? If you dont, you should read the manual.. cover to cover. If you aren't running scrubs I know you didn't read the manual :P
I assume this user guide. Guilty as charged :o. In my defense, I have read some parts of in Nov last year, before and during install of FreeNas (for the 1st time). There is a montly scrub configured by default when craeting a zfs volume, but I don't think it ran, because I turn off the nas when I don't use it.

I would do the memtest first, but I'm not expecting you'll find any errors.
It ran for 9 hours without any errors.

- As Stephens said, if you have no backup, better think about doing one.
The nas is the backup of my most important stuff (doc, pictures etc.), so no issues there. I will look into the other stuff on the nas and make a backup if needed.

- At the end of the ada1 log is the most recent 5 errors, I'm not familiar with the error, but someone else said that he had those errors when his BIOS settings for the SATA controller were set to IDE. Per the manual they should be set to AHCI. I would check your BIOS and see if it is set to AHCI.
I checked BIOS, it is AHCI.

Also, please provide the hardware you are using. Some people have problems with certain low quality SATA controllers and I'd like to see what you are plugging into.
Hardware
Intel Desktop Board DH61DLB3 (with 3 SATA interfaces through the Intel H61 Express Chipset)
Intel Pentium G630T Boxed
Corsair XMS CMX8GX3M2A1333C9 (2 dimms, total 8GB RAM)
Hitachi Deskstar 5K3000, 2TB (2 disks)
Seagate Barracuda Green ST2000DL001-9VT156, 2TB (1 disk)
be quiet! Pure Power L7 300W
Sandisk Cruzer Fit 4GB (FreeNas boot disk)
 

MadSaid

Dabbler
Joined
Aug 30, 2012
Messages
12
- You may want to do a scrub(but not yet). If the second hard drive with the errors is going bad, I can almost guarantee you that you'll rack up boatloads of errors(assuming it can even finish).
OK I ran zpool scrub. See result below:

Code:
[root@freenas] ~# zpool status -v volume1
  pool: volume1
 state: ONLINE
  scan: scrub repaired 0 in 0h6m with 0 errors on Mon Mar 18 07:02:39 2013
config:

        NAME                                            STATE     READ WRITE CKSUM
        volume1                                         ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            gptid/315fdd47-342e-11e2-8df0-e840f205bc31  ONLINE       0     0     0
            gptid/321ad0bf-342e-11e2-8df0-e840f205bc31  ONLINE       0     0     0
            gptid/329235e7-342e-11e2-8df0-e840f205bc31  ONLINE       0     0     0

errors: No known data errors
I only have 1 volume. It only took 6 min. to complete the scrub (892 GB data). Is this normal?
Conclusion: no errors, nothing to repair.

I guess, nothing to worry now and monitor the disks (weekly scrubs for now)?
 

MadSaid

Dabbler
Joined
Aug 30, 2012
Messages
12
OK, I got some checksum errors again. So I did:
  1. Run smartctl -t long for all 3 devices. No errors where encountered. I have the logs, in case needed.
  2. Scrub multiple times. In this cases scrub repaired data. I found it strange that the 1st run after I got checksum errors again, was finished in less that 1 min. The next 2 runs took about 1hr and 15 mins. See attachment for result. Makes me wonder if I should have ran it more than once the 1st time when I had checksum errors.
What can we conclude from all this? Is the issue fixed now? Why did it occur? Is there a HW issue? How can I isolate the issue? How to proceed?

Thanks in advance,

MadSaid
 

Attachments

  • scrub 2.txt
    3.8 KB · Views: 323

non-serviam

Cadet
Joined
Jul 20, 2013
Messages
6
I have the same problem but with 6 2TB disks (I am using 9.1.0). I am getting errors in all of the drives. Many many errors. Long SMART didn't show anything. I have done anything else on the post also. Some of my drives are 7200rpm and some 5400rmp. Does this have anything to do with the errors? Any help? My drives are new and I don't want to send them back...
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, you should have made your own thread with your issues. But more than likely you have a failed disk. Don't let the fact that they are new fool you. Infant mortality is real.

In any case, you definitely have a hardware issue, now you just have to narrow down the problem.

If you have any further questions please make a new thread and include your FreeNAS version, your hardware specs, and any error messages, etc. with your problem.
 
Status
Not open for further replies.
Top