Parity/CRC Errors When Accessing Drive During Scrub

Status
Not open for further replies.

bbddpp

Explorer
Joined
Dec 8, 2012
Messages
91
Hi folks,

I am running FreeNAS 9.1.1 on an old PC with 6 internal drives, and one connected external enclosure that can hold up to 4 drives (just have one drive in there right now), connected via a port multiplier card which I know FreeNAS LOVES (sarcasm).

Before 9.1.1 the port multiplier connected box would not even survive a reboot without a hard power cycle on itself...Since 9.1.1 it had been behaving a lot better. However, last night it looks like I hit the wall again.

It appears that at midnight an automated scrub process started on the drive in the enclosure (it resumes after a reboot and is currently at around 25%). It turns out I was trying to access a file on that drive at the same time (watching a movie). The movie would run for 30 minutes or so and then eventually would just totally die and kill the NAS with the error:

ahcich1: Timeout on slot 24 port 0
ahcich1: (bunch of numbers)
(ada0:ahci1:0:0:0): READ_PPDMA_QUEUED. (bunch of numbers)
(ada0:ahci1:0:0:0): CAM status: Command timeout
(ada0:ahci1:0:0:0): Retrying command

The entire system then froze.

Each drive is its own zpool currently, I have no RAID at all, this is a JBOD setup full of media files. I have not yet upgraded any of the individual ZFS pools to the new version of ZFS since upgrading. Could that have anything to do with it or is this likely the enclosure/port multiplier card combination rearing its ugly head again?

Please let me know if I can provide any more detail, boot logs, etc. And thanks!!! I'd very much like to get this resolved so I can use my enclosure and continue to expand my media server.

B.
 

warri

Guru
Joined
Jun 6, 2011
Messages
1,193
The ZFS version is irrelevant here, I guess it's the port multiplier/cable.
To check that it's not your drive, look at the smart output (smartctl -a -q noserial /dev/adaX) and zpool status (zpool status). You can also post them here if you want somebody else to look at it.

With your JBOD setup and external enclosure you should be aware that you might loose data unexpectedly, since you have a shaky connection and no redundancy.
 

bbddpp

Explorer
Joined
Dec 8, 2012
Messages
91
Thanks so much for the reply. Your suggestion to change out the Port Multiplier card was sound. I had a spare around, rebooted, and things seem much more stable now. In looking at the logs, the port multiplier card I had was unsupported by FreeNAS. Now when booting, the system tells me this card IS supported. Good stuff. Definitely a lesson in making sure your hardware is supported.

As for the backups, thanks also for the concern -- I have been weighing my options as well. I currently have around 15 TB of data spread across the 7 drives and around 2 TB of free space. I'm trying to determine if I want to spend the money on another 9 TB or so of space to just be the redundancy and create a pool, etc. As it stands now I realize that if a drive fails I can lose the entire contents of only that drive, which is why I didn't set up a pool (because I understand I'd basically lose it all if one drive failed). Still, losing one isn't good either, my hope is that for now, FreeNAS can at least alert me if it sees one of the drives starting to fail.
 

warri

Guru
Joined
Jun 6, 2011
Messages
1,193
Make sure to configure SMART reporting via email and regular scrubs, those will notify you in case of problems.

Also your understanding of a pool might not be complete: If you setup a pool with redundancy (Mirror, Raid-Z) you are protected against one or multiple hard disk failure(s). Of course this will reduce the amount of space you have available on your pool - on 7 disks people would usually use a Raid-Z2, so that you have the capacity of 5 drives and protection against two concurrent drive failures.
 

bbddpp

Explorer
Joined
Dec 8, 2012
Messages
91
Warri, you're right, I'm far from an expert on pools and ZFS.

When I built the Freenas box the basic idea was just a centralized place to store all my media. I built it with so many different sized and branded drives inside that I had left over that I didn't think it was even possible to run a true RAID setup anyway, and just figured if a drive was failing I'd at least get enough notice to get files moved off of it and elsewhere before it failed.

My ZFS scrubs all seem to be at the default of the 1st Sunday of every month, for each drive, is that enough or should I be staggering or doing these more often? I will say that the drives don't change much...Once they're full, I pretty much move onto the next drive.

SMART tests also seem to be set at default of quick self tests every 3 days for each drive.
 

warri

Guru
Joined
Jun 6, 2011
Messages
1,193
Some people recommend scrubs on a two-weekly schedule for consumer drives. I personally use the default 30 days though.

Make sure your SMART reporting is working correctly (i.e. you get an email in case of a problem). You can test your email settings in the Settings tab. Also I think smart reports get automatically send to the root user, make sure the email is set correctly on that one.

If you have the chance to rebuild the system with redundancy in mind, I'd do it. A lot less things to worry about if set up properly, and you don't have to rush to save data from a failing drive (which might not always work anyway).
 

bbddpp

Explorer
Joined
Dec 8, 2012
Messages
91
Thanks again for all your help with this. I stuck with the 30 days for now. Forgot to set up the email sends as well which are now sent so I'm golden.

I'd love to build redundancy in but I'd really have to downsize my media collection. Wish I wasn't such a pack rat but I just like having all my bought media digitally as well for some reason.
 
Status
Not open for further replies.
Top