How do i run Zpool Clear?

Clinderw · Apr 12, 2014

I was tempted to just run it in the cmd but dont want to screw anything up. How do i run a zpool clear?

WARNING: The volume pool02 (ZFS) status is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'.

Thanks,
Chris

alexg · Apr 12, 2014

You do not want to clear the error until you determine why. What does "zpool status" say through shell?

gpsguy · Apr 12, 2014

Chris, when you post the results, please put the output code tags so that it preserves the formatting.

Clinderw · Apr 13, 2014

Thanks for the responses, it appears i actually might have an issue on my hands so clearing the error isn't the right thing to do at this point. I just shut down my box because i keep getting the following issue.

Basically, i don't have any issues and then a few hours after the server boots up i get this critical error. It goes away after i reboot but then comes back again. Do i need to replace the drives already or does anyone have a good idea on what is causing this? Screen shot below -

@gpsguy, how do i do that?

warri · Apr 14, 2014

It looks like two of your drives keep dropping out. That's a bad sign, especially because it leaves you without redundancy. So you need to find out why this happens:
- Post the output of "zpool status" here.
- Have a look into the system log (issue "dmesg" in the shell, or look into the .system dataset if you are running a 9.2.1.x version) and see if you see some errors about the disks. Alternatively just attach the whole log here for us to have a look.
- Check the smart values of the disk: Issue "smartctl -a -q noserial /dev/adaX" for each of the two affected drives, find out the number (X) by looking into the status just after the NAS was started. Also post the information here.

Then we need to know which version of FreeNAS you are running and on which version you created the pool? Looks like its not using GUIDs for the disks, which have been introduced somewhere in 8.0.x or 8.1 I think.

Regarding the output: You can log into your server via SSH and issue the commands there (or use the Web Shell). Then copy the output (Web Shell might destroy the formatting, though), and put it between [code][/code]- Tags.

gpsguy · Apr 14, 2014

Chris, please tell us about your hardware. Are you using USB connected hard disks?

Clinderw · Apr 14, 2014

Sorry gang, just got home from a long one. Here are some deets:

FreeNAS-9.2.1.3-RELEASE-x64
Using 10 hard drives with 2 RaidZ2 vDevs: 4X3tb RaidZ2, 6X3tb RaidZ2
- 5 drives are connected to SATA motherboard ports, 5 drives are connected via an IBM M1015
- 4 or 5 of the drives are WD Reds and the others are desktop 3tb drives (i know i need to swap these out sooner rather then later)
I just upgraded from 8.3.1
I also just redid my volume by backing up all essential data, erasing the old volume, and creating a new one (wanted to take everything from RaidZ1 to RaidZ2). I can't remember if i created the new pool/volume before or after the update
Only thing hooked via USB is a UPS (my driver wasn't listed so i picked the next closest, i have an open thread asking what others did for this model), and an apple keyboard
Attached all the files @warri asked for, i wasn't sure exactly which drives were the problems in the vDev so i did a SMART output for each.

Thank you all for the help and specific instructions on how to get the necessary data to you all.

Clinderw · Apr 14, 2014

Alright, new alert just came through my email:

This message was generated by the smartd daemon running on:

host name: freenas
DNS domain: local

The following warning/error was logged by the smartd daemon:

Device: /dev/ada1, 8 Currently unreadable (pending) sectors

Device info:
ST3000DM001-9YN166, S/N:W1F19NPX, WWN:5-000c50-053917dee, FW:CC9F, 3.00 TB

For details see host's SYSLOG.

AND

This message was generated by the smartd daemon running on:

host name: freenas
DNS domain: local

The following warning/error was logged by the smartd daemon:

Device: /dev/ada1, 8 Offline uncorrectable sectors

Device info:
ST3000DM001-9YN166, S/N:W1F19NPX, WWN:5-000c50-053917dee, FW:CC9F, 3.00 TB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
No additional messages about this problem will be sent.

alexg · Apr 14, 2014

Looks like you never ran disk tests. All your drives showing that no self testing has been scheduled. I would suggest that you start with smartctl -t long /dev/adaX for each of your drives. You should also check your SATA cables.

Clinderw · Apr 14, 2014

"never ran disk checks" did i skip that in the FreeNas manual or where would I maybe read to do that? Is there an automated function i can set up to do this?

Running these checks now.

Thanks,
Chris

Clinderw · Apr 14, 2014

Is there any problem with testing all disks at one?

thanks,
Chris

cyberjock · Apr 14, 2014

Here's my review of all that stuff:

zpool status shows 2 drives with CHKSUM errors. That's not a good thing(obviously).

Drive 1 has some Current_Pending_Sector errors. Offline_Uncorrectable is also 8. That drive has problems. If you do a SMART long test it will fail. At that point it will qualify for an RMA if in warranty. That's the only drive showing any problems. If that disk is attached to a SATA controller that happens to have whatever other disk has those CHKSUM errors, the controller may not like the errors. This may mean that the controller can't handle errors from one disk without affecting other disks. If this is the case, it may be a bad choice for a server. If it's the M1015 it could be something else(probably user error related such as a bad firmware match for the driver or setting change). I know the M1015 can handle disk problems without problem(unless you want to believe you are the first one to ever have this problem on the forums which I dismiss as unlikely).

Overall, I'd replace "Drive 1" and see what happens. I'd do that sooner than later too since your vdev may be having problems.

Noteworthy but unrelated is your hard drive temps. For drives that are probably near-idle those temps are a little high. Almost certainly they go over the famed 40C line(which is where drive life can be shortened). Not necessarily a bad thing, but if you can increase the cooling I'd absolutely od that.

cyberjock · Apr 14, 2014

Oh, and you are running zero SMART tests... so you should start doing that...

alexg · Apr 14, 2014

I would suggest an excellent post by cyberjock and he also answered my question regarding concurrent smart tests. I personally decided to stagger them just in case smart test causes all of them to go bad at once, but it is probably unnecessary.

http://forums.freenas.org/index.php?threads/scrub-and-smart-testing-schedules.20108/

cyberjock · Apr 14, 2014

SMART tests won't make them go bad. They may fail the test but they won't go offline when they fail their test. Frankly, if a SMART test is causing more than 1 disk to indicate problems at the same time you've made more than 1 grave mistake with your server design as that's extremely unlikely if you've done everything right.

Clinderw · Apr 15, 2014

Great stuff, many take aways. Thank you CyberJock for the comprehensive feedback.

I just took the box offline in the meantime to prevent any accidents. Ordered a hard drive, and will set up SMART tests later this week. Will also try for the warranty on the one drive but i might be out of luck on that one. We'll see...

Thanks,
Chris

Clinderw · Apr 21, 2014

I replaced the hard drive on Friday and everything seems to be running as usual again. I do have two follow up questions -

When i run Zpool clear it essentially says it needs more information then just typing 'zpool clear' in the CMD line. What is the proper script to run?
The second hard drive that was throwing issues before still has a CKSUM value of 42. Is that something to be concerned about?

Thanks,
Chris

warri · Apr 21, 2014

1. zpool clear <poolname>
2. No indications of failure in the smart tests and logs? If you find something, replace the disk. If not, clear the pool and scrub the volume to see if the checksum errors persist.

EDIT: Oh wait, by replacing one drive and resilvering the volume, a scrub was already performed. If the number of checksum errors didn't increase, you are good to clear for now without scrubbing.

Clinderw · Apr 22, 2014

Great - just cleared it.

Everything is back to operational / green.

Important Announcement for the TrueNAS Community.

How do i run Zpool Clear?

Explorer

Contributor

Active Member

Explorer

Attachments

Guru

Active Member

Explorer

Attachments

Explorer

Contributor

Explorer

Explorer

Inactive Account

Inactive Account

Contributor

Inactive Account

Explorer

Explorer

Guru

Explorer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "How do i run Zpool Clear?"

Similar threads