Hey all -
First time poster, been using FreeNAS 8.3.0 for close to a year now. Lemme rephrase that, I've had it set up, working with no problems for almost a year now. (that includes a stretch of 180+ days of constant uptime, with almost no maintenance needed... just checked the drives now and then and I was good to go...)
In the last 2-3 weeks, I started having problems. Just looking for some "you read it right, still need to do this", "totally wrong, what were you thinking" sorts of thoughts from everyone.
I have a RAIDZ2 array set up, 7x2TB drives. (I know, works better with 6, but I had 7, so when I built it, that's what I used).. They're connected to a Highpoint 27xx controller. (yeah, I know, they're crap, don't run SMART worth a darn, etc... but as I said, this is the first issue I've had in a year of using it, so I'm not complaining...)
I came home to find that the GUI had completely stopped working. Couldn't SSH in, couldn't web connect... Nothing. I thought all was lost! I rebooted the machine and everything came up, but I had the yellow "Alert" indicator... I scrubbed the volume and found that a drive had some issues, but it cleaned itself up. I found out it was drive 1, got the serial number, everything good. I also found that my console monitor shows "1 pending sector..." for the 1/1/1 drive... so that sorta confirmed to me drive 1 had an issue.
Now, once the scrub was complete, the "Alert" indicator went back to green.... The drive obviously had a read error... the SMART log (which I run manually... since the drivers are crap, see above, I know...) shows that. But after the scrub, everything went green...
Question 1: Does this mean that the "system" is just working around the read error on the drive? I should replace the drive, but it doesn't necessarily count right now as one of my two "failures" before my array has issues?
Things went okay for about 4-5 days... and I've since had a second drive start having issues. I'm in the middle of a scrub on that drive now to get things all cleared up... the "Alert" icon is currently showing yellow. (the Volume Status is showing checksum errors on drive 4, as my SMART logs indicated it should...)
Question 2: Does this count as my second drive out of action and I need to immediately start swapping drives, or if the scrub (that's currently ongoing) fixes things, will I be back to normal? I realize once a drive has a read error, they'll just continue to pile up... so it's best to get them replaced... just curious if I need to immediately swap now, or can swap one out now, then one when the warranty replacement arrives...
I manually started a scrub to see what it cleans up prior to doing a replacement... (I've read it's best to have the data in good condition before starting a replacement/resilvering...)
I've got the manual open, I see what I need to do for the replacement.. just looking for a little guidance on how I'm reading the errors and the actions I've taken so far to make sure things stay operational...
Thanks guys!
Rich
First time poster, been using FreeNAS 8.3.0 for close to a year now. Lemme rephrase that, I've had it set up, working with no problems for almost a year now. (that includes a stretch of 180+ days of constant uptime, with almost no maintenance needed... just checked the drives now and then and I was good to go...)
In the last 2-3 weeks, I started having problems. Just looking for some "you read it right, still need to do this", "totally wrong, what were you thinking" sorts of thoughts from everyone.
I have a RAIDZ2 array set up, 7x2TB drives. (I know, works better with 6, but I had 7, so when I built it, that's what I used).. They're connected to a Highpoint 27xx controller. (yeah, I know, they're crap, don't run SMART worth a darn, etc... but as I said, this is the first issue I've had in a year of using it, so I'm not complaining...)
I came home to find that the GUI had completely stopped working. Couldn't SSH in, couldn't web connect... Nothing. I thought all was lost! I rebooted the machine and everything came up, but I had the yellow "Alert" indicator... I scrubbed the volume and found that a drive had some issues, but it cleaned itself up. I found out it was drive 1, got the serial number, everything good. I also found that my console monitor shows "1 pending sector..." for the 1/1/1 drive... so that sorta confirmed to me drive 1 had an issue.
Now, once the scrub was complete, the "Alert" indicator went back to green.... The drive obviously had a read error... the SMART log (which I run manually... since the drivers are crap, see above, I know...) shows that. But after the scrub, everything went green...
Question 1: Does this mean that the "system" is just working around the read error on the drive? I should replace the drive, but it doesn't necessarily count right now as one of my two "failures" before my array has issues?
Things went okay for about 4-5 days... and I've since had a second drive start having issues. I'm in the middle of a scrub on that drive now to get things all cleared up... the "Alert" icon is currently showing yellow. (the Volume Status is showing checksum errors on drive 4, as my SMART logs indicated it should...)
Question 2: Does this count as my second drive out of action and I need to immediately start swapping drives, or if the scrub (that's currently ongoing) fixes things, will I be back to normal? I realize once a drive has a read error, they'll just continue to pile up... so it's best to get them replaced... just curious if I need to immediately swap now, or can swap one out now, then one when the warranty replacement arrives...
I manually started a scrub to see what it cleans up prior to doing a replacement... (I've read it's best to have the data in good condition before starting a replacement/resilvering...)
Code:
pool: HD state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scan: scrub in progress since Wed Jan 22 19:55:50 2014 452G scanned out of 8.65T at 239M/s, 9h59m to go 242K repaired, 5.10% done config: NAME STATE READ WRITE CKS UM HD ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gptid/b9c89418-414c-11e2-b2e1-902b34adb688 ONLINE 0 0 0 gptid/ba04b1d7-414c-11e2-b2e1-902b34adb688 ONLINE 0 0 0 gptid/ba3f6ae5-414c-11e2-b2e1-902b34adb688 ONLINE 0 0 0 gptid/ba7079df-414c-11e2-b2e1-902b34adb688 ONLINE 0 0 25 (repairing) gptid/baaaf5bb-414c-11e2-b2e1-902b34adb688 ONLINE 0 0 0 gptid/bb31273a-414c-11e2-b2e1-902b34adb688 ONLINE 0 0 0 gptid/bb8063bd-414c-11e2-b2e1-902b34adb688 ONLINE 0 0 0 errors: No known data errors
I've got the manual open, I see what I need to do for the replacement.. just looking for a little guidance on how I'm reading the errors and the actions I've taken so far to make sure things stay operational...
Thanks guys!
Rich