I am running a zfs array in an HP N40L Microserver that has been 99.9% reliable over the past 3 years or so. Yesterday, I got an error resulting in data corruption, but the volume says it's still online, but "applications may be affected".
The first thing I did was a zpool status -v to see which file(s) were affected, and I got the following:
I do not consider myself a FreeNAS, nor a BSD expert, so my ability to troubleshoot this is a little limited. From what I can tell, it looks like a single file (libaria2.a) on the flash drive is corrupt. The primary data volume on the server seems unaffected. It hosts roughly 10 VMs connected via iSCSI to small Xen cluster. All the VMs have been happily running, but I obviously want to fix the error.
Should I schedule some downtime and pull the thumb drive? Should I just plan on replacing it altogether? I have plenty laying around... but if I'm going to do that, I'll probably start looking for a no-kidding, know-for-reliability thumb drive. To be honest, the fact that it's been working perfectly for so long is pretty impressive to me. :)
The first thing I did was a zpool status -v to see which file(s) were affected, and I got the following:
Code:
[root@nas2] /# zpool status -v pool: freenas-boot state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: scrub repaired 3K in 0h4m with 1 errors on Tue Mar 3 03:50:52 2015 config: NAME STATE READ WRITE CKSUM freenas-boot ONLINE 0 0 1 da0p2 ONLINE 0 0 4 errors: Permanent errors have been detected in the following files: //usr/local/lib/libaria2.a pool: vol0 state: ONLINE scan: scrub repaired 0 in 8h19m with 0 errors on Sun Feb 1 08:19:44 2015 config: NAME STATE READ WRITE CKSUM vol0 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 gptid/5dfc45be-8946-11e4-9b25-00151776e8ba ONLINE 0 0 0 gptid/5e866d36-8946-11e4-9b25-00151776e8ba ONLINE 0 0 0 gptid/5f11ec5e-8946-11e4-9b25-00151776e8ba ONLINE 0 0 0 cache gptid/5f5faaee-8946-11e4-9b25-00151776e8ba ONLINE 0 0 0 errors: No known data errors
I do not consider myself a FreeNAS, nor a BSD expert, so my ability to troubleshoot this is a little limited. From what I can tell, it looks like a single file (libaria2.a) on the flash drive is corrupt. The primary data volume on the server seems unaffected. It hosts roughly 10 VMs connected via iSCSI to small Xen cluster. All the VMs have been happily running, but I obviously want to fix the error.
Should I schedule some downtime and pull the thumb drive? Should I just plan on replacing it altogether? I have plenty laying around... but if I'm going to do that, I'll probably start looking for a no-kidding, know-for-reliability thumb drive. To be honest, the fact that it's been working perfectly for so long is pretty impressive to me. :)