How do i run Zpool Clear?

Status
Not open for further replies.

Clinderw

Explorer
Joined
Aug 11, 2013
Messages
96
I was tempted to just run it in the cmd but dont want to screw anything up. How do i run a zpool clear?

WARNING: The volume pool02 (ZFS) status is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'.

Thanks,
Chris
 

alexg

Contributor
Joined
Nov 29, 2013
Messages
197
You do not want to clear the error until you determine why. What does "zpool status" say through shell?
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Chris, when you post the results, please put the output code tags so that it preserves the formatting.
 

Clinderw

Explorer
Joined
Aug 11, 2013
Messages
96
Thanks for the responses, it appears i actually might have an issue on my hands so clearing the error isn't the right thing to do at this point. I just shut down my box because i keep getting the following issue.

Basically, i don't have any issues and then a few hours after the server boots up i get this critical error. It goes away after i reboot but then comes back again. Do i need to replace the drives already or does anyone have a good idea on what is causing this? Screen shot below -

@gpsguy, how do i do that?
 

Attachments

  • freenas errr.png
    freenas errr.png
    14.8 KB · Views: 2,526

warri

Guru
Joined
Jun 6, 2011
Messages
1,193
It looks like two of your drives keep dropping out. That's a bad sign, especially because it leaves you without redundancy. So you need to find out why this happens:
- Post the output of "zpool status" here.
- Have a look into the system log (issue "dmesg" in the shell, or look into the .system dataset if you are running a 9.2.1.x version) and see if you see some errors about the disks. Alternatively just attach the whole log here for us to have a look.
- Check the smart values of the disk: Issue "smartctl -a -q noserial /dev/adaX" for each of the two affected drives, find out the number (X) by looking into the status just after the NAS was started. Also post the information here.

Then we need to know which version of FreeNAS you are running and on which version you created the pool? Looks like its not using GUIDs for the disks, which have been introduced somewhere in 8.0.x or 8.1 I think.

Regarding the output: You can log into your server via SSH and issue the commands there (or use the Web Shell). Then copy the output (Web Shell might destroy the formatting, though), and put it between [code][/code]- Tags.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Chris, please tell us about your hardware. Are you using USB connected hard disks?
 

Clinderw

Explorer
Joined
Aug 11, 2013
Messages
96
Sorry gang, just got home from a long one. Here are some deets:
  • FreeNAS-9.2.1.3-RELEASE-x64
  • Using 10 hard drives with 2 RaidZ2 vDevs: 4X3tb RaidZ2, 6X3tb RaidZ2
    • 5 drives are connected to SATA motherboard ports, 5 drives are connected via an IBM M1015
    • 4 or 5 of the drives are WD Reds and the others are desktop 3tb drives (i know i need to swap these out sooner rather then later)
  • I just upgraded from 8.3.1
  • I also just redid my volume by backing up all essential data, erasing the old volume, and creating a new one (wanted to take everything from RaidZ1 to RaidZ2). I can't remember if i created the new pool/volume before or after the update
  • Only thing hooked via USB is a UPS (my driver wasn't listed so i picked the next closest, i have an open thread asking what others did for this model), and an apple keyboard
  • Attached all the files @warri asked for, i wasn't sure exactly which drives were the problems in the vDev so i did a SMART output for each.
Thank you all for the help and specific instructions on how to get the necessary data to you all.
 

Attachments

  • zpool status.txt
    1.5 KB · Views: 788
  • dmsg.txt
    10.5 KB · Views: 420
  • SMART info - Drive 1.txt
    5.1 KB · Views: 487
  • SMART info - Drive 2.txt
    5.1 KB · Views: 316
  • SMART info - Drive 3.txt
    5.2 KB · Views: 313
  • SMART info - Drive 4.txt
    5.4 KB · Views: 317
  • SMART info - Drive 5.txt
    5.1 KB · Views: 286
  • SMART info - Drive 6.txt
    4.5 KB · Views: 277

Clinderw

Explorer
Joined
Aug 11, 2013
Messages
96
Alright, new alert just came through my email:

This message was generated by the smartd daemon running on:

host name: freenas
DNS domain: local

The following warning/error was logged by the smartd daemon:

Device: /dev/ada1, 8 Currently unreadable (pending) sectors

Device info:
ST3000DM001-9YN166, S/N:W1F19NPX, WWN:5-000c50-053917dee, FW:CC9F, 3.00 TB

For details see host's SYSLOG.

AND

This message was generated by the smartd daemon running on:

host name: freenas
DNS domain: local

The following warning/error was logged by the smartd daemon:

Device: /dev/ada1, 8 Offline uncorrectable sectors

Device info:
ST3000DM001-9YN166, S/N:W1F19NPX, WWN:5-000c50-053917dee, FW:CC9F, 3.00 TB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
No additional messages about this problem will be sent.
 

alexg

Contributor
Joined
Nov 29, 2013
Messages
197
Looks like you never ran disk tests. All your drives showing that no self testing has been scheduled. I would suggest that you start with smartctl -t long /dev/adaX for each of your drives. You should also check your SATA cables.
 

Clinderw

Explorer
Joined
Aug 11, 2013
Messages
96
"never ran disk checks" did i skip that in the FreeNas manual or where would I maybe read to do that? Is there an automated function i can set up to do this?

Running these checks now.

Thanks,
Chris
 

Clinderw

Explorer
Joined
Aug 11, 2013
Messages
96
Is there any problem with testing all disks at one?

thanks,
Chris
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Here's my review of all that stuff:

zpool status shows 2 drives with CHKSUM errors. That's not a good thing(obviously).

Drive 1 has some Current_Pending_Sector errors. Offline_Uncorrectable is also 8. That drive has problems. If you do a SMART long test it will fail. At that point it will qualify for an RMA if in warranty. That's the only drive showing any problems. If that disk is attached to a SATA controller that happens to have whatever other disk has those CHKSUM errors, the controller may not like the errors. This may mean that the controller can't handle errors from one disk without affecting other disks. If this is the case, it may be a bad choice for a server. If it's the M1015 it could be something else(probably user error related such as a bad firmware match for the driver or setting change). I know the M1015 can handle disk problems without problem(unless you want to believe you are the first one to ever have this problem on the forums which I dismiss as unlikely).

Overall, I'd replace "Drive 1" and see what happens. I'd do that sooner than later too since your vdev may be having problems.

Noteworthy but unrelated is your hard drive temps. For drives that are probably near-idle those temps are a little high. Almost certainly they go over the famed 40C line(which is where drive life can be shortened). Not necessarily a bad thing, but if you can increase the cooling I'd absolutely od that.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Oh, and you are running zero SMART tests... so you should start doing that...
 

alexg

Contributor
Joined
Nov 29, 2013
Messages
197

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
SMART tests won't make them go bad. They may fail the test but they won't go offline when they fail their test. Frankly, if a SMART test is causing more than 1 disk to indicate problems at the same time you've made more than 1 grave mistake with your server design as that's extremely unlikely if you've done everything right.
 

Clinderw

Explorer
Joined
Aug 11, 2013
Messages
96
Great stuff, many take aways. Thank you CyberJock for the comprehensive feedback.

I just took the box offline in the meantime to prevent any accidents. Ordered a hard drive, and will set up SMART tests later this week. Will also try for the warranty on the one drive but i might be out of luck on that one. We'll see...

Thanks,
Chris
 

Clinderw

Explorer
Joined
Aug 11, 2013
Messages
96
I replaced the hard drive on Friday and everything seems to be running as usual again. I do have two follow up questions -
  1. When i run Zpool clear it essentially says it needs more information then just typing 'zpool clear' in the CMD line. What is the proper script to run?
  2. The second hard drive that was throwing issues before still has a CKSUM value of 42. Is that something to be concerned about?
Thanks,
Chris
 

warri

Guru
Joined
Jun 6, 2011
Messages
1,193
1. zpool clear <poolname>
2. No indications of failure in the smart tests and logs? If you find something, replace the disk. If not, clear the pool and scrub the volume to see if the checksum errors persist.

EDIT: Oh wait, by replacing one drive and resilvering the volume, a scrub was already performed. If the number of checksum errors didn't increase, you are good to clear for now without scrubbing.
 

Clinderw

Explorer
Joined
Aug 11, 2013
Messages
96
Great - just cleared it.

Everything is back to operational / green.
 
Status
Not open for further replies.
Top