Failed Drive

DarthMuppet · Aug 9, 2021

I have a Gen 8 microserver with a battery-backed HP SmartArray controller, 4 x Seagate Constellation 3TB SAS drives in RAID 5. One of the drives failed this morning (bay 3), and the single-drive zfs pool is marked as DEGRADED which, as far as I can tell, it actually isn't, as the SmartArray is doing its job. I have been able to verify the last backup with no apparent errors, and the SmartArray seems to be quick enough that I can saturate the gigabit network connection whilst doing so.

I am assuming that the zpool is marked as degraded because the cciss driver can see the failed drive (well, it *can't* see it, which is probably the problem!), and also that zfs just sees this as a single physical drive (which is how the SA is presenting it). So if zfs thinks it's failed, is it doing anything to deal with that (even though there is no need to, because the RAID controller is doing its job)? is zfs going to take any action when I replace the drive, aside from (hopefully) tell me it is no longer degraded? I expect the RAID controller to just rebuild automatically when I replace the drive, and carry on as normal.

TIA

DM

Heracles · Aug 9, 2021

Hey @DarthMuppet,

DarthMuppet said:
with a battery-backed HP SmartArray controller

No Go here...

DarthMuppet said:
drives in RAID 5

Double No-Go...

DarthMuppet said:
as the SmartArray is doing its job.

Tripple No-Go

DarthMuppet said:
I have been able to verify the last backup

Good to have backup but just to be sure, you are referring to a copy of your data that is completely outside this system right ? Snapshots are no backups.

DarthMuppet said:
zfs just sees this as a single physical drive (which is how the SA is presenting it)

You should read this post, but right after you exported all your data out of that system first. You are very high risk to loosing it all as of now...

DarthMuppet · Aug 9, 2021

Not sure any of of that article applies to my scenario, except maybe #2. I am using RAID5 managed on the SmartArray, that article is about (not) using RAID cards to present multiple drives as single JBOD-style logical drives, and having ZFS own them, which I fully agree is not a good idea. The only thing I miss out on is TrueNAS not automating the SMART tests, but I have a cron job to do that

My backups are on another device. Strictly speaking most of my data are backups anyway (DVDs / blurays / CDs, all of which I still own, aside from Rock of Ages and Looper, I got rid of them, they were rubbish). The really important stuff (photos) goes to multiple clouds.

Do people really use snapshots as backups? Wow.

Spearfoot · Aug 9, 2021

DarthMuppet said:
Not sure any of of that article applies to my scenario, except maybe #2. I am using RAID5 managed on the SmartArray, that article is about (not) using RAID cards to present multiple drives as single JBOD-style logical drives, and having ZFS own them, which I fully agree is not a good idea. The only thing I miss out on is TrueNAS not automating the SMART tests, but I have a cron job to do that

My backups are on another device. Strictly speaking most of my data are backups anyway (DVDs / blurays / CDs, all of which I still own, aside from Rock of Ages and Looper, I got rid of them, they were rubbish). The really important stuff (photos) goes to multiple clouds.

Do people really use snapshots as backups? Wow.

What @Heracles is saying, is that it's a very bad idea to pass a RAID array to TrueNAS; ZFS needs direct access to the disks. What you're doing is even worse than passing JBOD-style drives from a RAID adapter -- your passing the whole array instead! For best results, you really need to use a Host Bus Adapter (HBA).

Also, RAID5 has been deprecated for a long time, at least for 'large' disks, the reason being that replacing a single failed disk puts you at risk of losing the entire RAID5 array -- along with all of your data!. For the same reason, ZFS RAIDZ1 arrays, being 'sort of' similar to RAID5, aren't recommended either.

ChrisRJ · Aug 9, 2021

@DarthMuppet ,

DarthMuppet said:
Not sure any of of that article applies to my scenario, except maybe #2.

Hm, what does bring you that conclusion?

DarthMuppet · Aug 9, 2021

Spearfoot said:
ZFS needs direct access to the disks.

Obviously it doesn't. If it did, my system would not work. What you probably mean is that ZFS needs direct access to the individual disks if you want to get the most out of it, and that's not the same thing at all. In practice, it does have direct access to the single logical disk my SmartArray presents to it, but I get little benefit as there is only one (logical) disk. ZFS tells me of errors (six corrupt files so far, and as far as I can tell they are actually OK), but it can't fix them. I'm fine with that, as I have secure backups.

The article explains why JBOD on a RAID controller is worse than JBOD on an HBA, and I have no reason to disagree with that. I pass a single fault tolerant disk to it, with a write cache, relieving ZFS and my CPU from any bother during a physical drive recovery. How can that be worse than 4 drives? I don't use an HBA, because that device didn't have one. I have an LSI 9211-8i spare now, so I would use it in the unlikely (see below) event that I lose my data.

Also, RAID5 has been deprecated for a long time, at least for 'large' disks, the reason being that replacing a single failed disk puts you at risk of losing the entire RAID5 array -- along with all of your data!. For the same reason, ZFS RAIDZ1 arrays, being 'sort of' similar to RAID5, aren't recommended either.

It's funny you say that about deprecation of RAID5. I have 5 of these drives in total. The one with the least amount of reads has done 1960099.042 GB, according to the SMART data (that's a paste BTW, not a typo). That's 2PB. Each. Let's make it worse. SAS drives do background scans. They read every sector, and re-write it if they feel they need to. Mine run weekly, and I can't turn them off. The URE on these drives is supposedly 1 bit in 10^15 (or is it....more in a minute!), which is 125TB. 125 TB / 3TB (disk size) is ~42, the number of weeks before the drive will have read itself enough to meet that threshold.. With an 8TB drive, your drive will likely have a URE in just over 3 months. Oh, and that's without putting any data on it. My 5 drives have over 10PB (maybe 13PB, not sure if the scans are included in the SMART data or not) of reading under their collective belts, with without a URE (4 show zero, the failed one is not accessible yet). Based on my experience, 1 in 10^15 is extremely conservative (as I've shown, I have two orders of magnitude higher than that). Clearly it's a small sample, and YMMV.

Here's the 'more' I referred to (trying to keep a straight face here). Seagate's spec for my drives actually says that the read error rate (Nonrecoverable Read Errors per Bits Read) is 1 sector in 10^15. Not one *bit* in 10^15, but one *sector* in 10^15. Grammatically, that means one sector in 10^15 sectors (if the subject is missing from a dependent clause, the subject from the parent clause is inherited). That's 454 PB. I think they'd be boasting about that if it were true

ChrisRJ said:
@DarthMuppet ,

Hm, what does bring you that conclusion?

1) An HBA is a Host Bus Adapter. I'm not using an HBA.
2) FreeBSD has incredibly robust support for the LSI HBA's. I'm sure it does, but see above.
3) You must crossflash to IT/IR firmware - see above
4) FreeBSD may or may not have good support for other HBA's/RAID controllers. The ciss driver was introduced (I think) in FreeBSD 10.3, which was released in 2014, so it's got 7 years of availability. The bulk of this bullet though is about high loads placed on the HBA by ZFS, and there won't be any in my config and use case (see #6)
5) A RAID controller that supports "JBOD" or "HBA mode" isn't the same. I'm not using this configuration
6) A RAID controller with write cache is particularly bad. "A RAID controller with a write cache is likely to get swamped by the massive I/O ZFS is pushing." Again, it's a single SCSI disk. There's no writing during a scrub as there's nothing to repair the file from, and there's no IO hitting the write cache during a resilver as the RAID controller is doing it (probably quite slowly, granted).

I don't disagree with the content, but I don't see how it applies to me. Quite comfortable being wrong though.

Heracles · Aug 9, 2021

Indeed, ZFS does need direct access to the disk. It needs it not to do basic stuff, it needs it to ensure your data will be safe. Without such a direct access, your data is at jeopardy at each and every write access. And because ZFS is copy-on-write, that means basically every access.

As of why you can not do smart test is directly related to this also : your TrueNAS server does not see any of these drives. As such, it can not do smart test.

DarthMuppet said:
1) An HBA is a Host Bus Adapter. I'm not using an HBA.

And know that you have to use one for your data to be safe.

What you are doing is one of the worst thing possible and one of the highest risk situation.

This is what will happen to you, as it did to so many others....
Loosing it all to a Raid controler...

Spearfoot · Aug 9, 2021

DarthMuppet said:
...snip... If it did, my system would not work. ...snip...

Ah, but does your system actually 'work'?

You're here on the forum, asking for help with a degraded pool... perhaps your system isn't working so well?

Just because you can use a RAID controller with ZFS, doesn't mean that you should. You're finding that out now, with your degraded pool. The only way I imagine you'll be able to fix it is to tinker with your HP SmartArray, replacing the faulty drive. Running a scrub afterwards may repair the pool. I've no idea how successful this will be.

With freedom comes responsibility. You are certainly free to use ZFS in a way the experts have repeatedly advised against, but the responsibility for any disaster that ensues will lie entirely with you.

I'll leave you with this wise quote from @jgreco, responding to a forum user who asked about using an HP Smart Array controller with TrueNAS:

You need to make some minor adjustments to the Smart Array controllers. Remove them from the machine, place them in ESD bags, and sell them on eBay. Then order yourself a nice pair of LSI HBA's...

https://www.truenas.com/community/t...s-and-why-cant-i-use-a-raid-controller.81931/

Do not attempt to use the P420i for your ZFS array. Much sadness may eventually result.

verbum sapienti

jgreco · Aug 10, 2021

DarthMuppet said:
1) An HBA is a Host Bus Adapter. I'm not using an HBA.

Bad.

4) FreeBSD may or may not have good support for other HBA's/RAID controllers. The ciss driver was introduced (I think) in FreeBSD 10.3, which was released in 2014, so it's got 7 years of availability.

Incorrect history of the CISS driver. CISS was introduced in FreeBSD 4.8R, released April 2003. Allowing some liberty to keep things simple, CISS was an idea that host-side interfacing for RAID controllers could be standardized, and hoping that additional cards would jump on that bandwagon.

The bulk of this bullet though is about high loads placed on the HBA by ZFS, and there won't be any in my config and use case (see #6)

Thank you for explaining to us what you think the bulk of the bullet is about. In my opinion, you're wrong. The bulk of the bullet is about "FreeBSD may or may not have good support for other HBA's/RAID controllers." As author of that line, I think I get to define what my bullets are actually about. And these guys here on the forum have similar experiences or have been listening to me kvetch on about this stuff for years, so many of them have a pretty good idea about it too.

I have spent years working with these things, not necessarily in a ZFS context. I have a bunch of Proliants that have CISS-based controllers in them and were used for years in high traffic contexts. The CISS stuff isn't horrible if you use the controllers for RAID1 or RAID5; they cleanly hide problems with failed sectors and if you can afford a bay for a warm spare, will swap out a failed disk. However, array I/O may still lock up while the controller is waiting for a disk to time out. More or less "the expected stuff."

Because in the USENET business we normally did redundant servers rather than relying on on-host RAID, and the storage needs were immense, we did NOT use these in RAID1/5, but rather in a JBOD configuration. And there, they sucked. They often hid what the exact failure was behind the scenes, would return zero-filled blocks without reporting an actual error when a sector failed to read, and had a variety of other foibles once things started to go wrong. Foibles that made it hard to spot things going wrong, and hard to identify WHAT was going wrong.

It is actually that which shares a lot in common with ZFS, because ZFS uses drives in a similar manner.

I don't disagree with the content, but I don't see how it applies to me. Quite comfortable being wrong though.

I'll concede the quoted article doesn't specifically discuss your RAID5 case, because the quoted article is primarily trying to scare people off using RAID card JBOD modes, but a lot of the article absolutely does apply to you and you are absolutely wrong.

The discussed bits include needing to use a driver that is known to be as close to 100% as possible (CISS isn't), just for example. That absolutely applies to you.

The undiscussed (i.e. your) problem is that ZFS relies on its redundancy to maintain the integrity of the pool. When ZFS encounters an error, let's say for example a block checksum fails, it rebuilds the block from redundancy, writes it back out, and then gets on with life, happy once again.

Having redundancy with ZFS is pretty much not optional -- it's required. If you do not have redundancy, a corruption introduced into the pool is uncorrectable. ZFS does not have tools like a "fsck" tool, which would require immense amounts of memory and pool I/O to do, so ZFS *relies* on its amazing ability to have a reliable and correct pool.

Once you introduce an error into the pool, that's bad. If it is merely file data, that's "recoverable" in that you can remove the file and the pool itself is not permanently damaged. However, if pool metadata gets hosed, that may well be unfixable, and at that point, your options quickly dwindle down to the dreaded "back up whatever you can and then create a fresh ZFS pool."

All these people here, myself included, are trying to help you get to a point where you have a more reliable system. Arguing to the contrary is pointless, except insofar as it educates you as to the realities, but that is already largely done in stickies so that you have a high quality answer to refer to.

DarthMuppet · Aug 10, 2021

Spearfoot said:
Ah, but does your system actually 'work'?

You're here on the forum, asking for help with a degraded pool... perhaps your system isn't working so well?

Just because you can use a RAID controller with ZFS, doesn't mean that you should. You're finding that out now, with your degraded pool. The only way I imagine you'll be able to fix it is to tinker with your HP SmartArray, replacing the faulty drive. Running a scrub afterwards may repair the pool. I've no idea how successful this will be.

With freedom comes responsibility. You are certainly free to use ZFS in a way the experts have repeatedly advised against, but the responsibility for any disaster that ensues will lie entirely with you.

I'll leave you with this wise quote from @jgreco, responding to a forum user who asked about using an HP Smart Array controller with TrueNAS:

verbum sapienti

I came here with a very specific question, which nobody has answered. All I've had is 'you're doing it all wrong, stupid'. I have not disagreed with any of the given advice about how I should run things (which I already knew), and I even said that if it all dies, I will actually go with the guidance as I am now in a position to do so. My current strategy suits my needs, my budget and my equipment. My data is safe. As I have described above, I understand and accept the risks.

I have never lost a RAID5 set in production. I have overseen the deployment of literally thousands. maybe even tens of thousands, mainly before 2010, when we stopped using it. I would never use RAID 5 in production today, but that's not the point. It's perfectly serviceable for a tech savvy domestic user who is aware of the risks and able to mitigate them as I have. The only thing wrong with my system today is a dead disk, and 6 files that the most recent scrub says have permanent errors.

What I want to know is whether ZFS is going to get in the way of my RAID controller, and why TrueNAS thinks the zpool is degraded. if is just because the scrub found errors, or is something else going on? So far, I've been told 'what'. I'd much rather be told 'why', because that is *always* more important. Be as technical as you wish. None of the links anyone has posted have in any way explained why, or applied to my configuration.

Heracles said:
Indeed, ZFS does need direct access to the disk.

Indeed, it has it. One SCSI disk, presented by my RAID controller. As far as I can tell, it sees it and treats it as a 9TB SCSI disk.

It needs it not to do basic stuff, it needs it to ensure your data will be safe. Without such a direct access, your data is at jeopardy at each and every write access. And because ZFS is copy-on-write, that means basically every access.

How, in my specific scenario, is ZFS going to keep my data safe? It isn't. I know that. I accept that. At scrub time it will tell me if it has been damaged, but can't repair it. I explained that. I understand it. I am perfectly capable of conducting a risk analysis and coming up with a mitigation plan (it's actually my day-job) and I accept the risk. The impact is lower for me as a domestic user versus me as an employee managing my company's data, which is why I accept it. I could not at the time implement things in line with best practice, and there is no point doing so now without due cause (which I *may* now have). I am asking for explanations and assistance, and what I am getting is not in any way helpful.

As of why you can not do smart test is directly related to this also : your TrueNAS server does not see any of these drives. As such, it can not do smart test.

Not true. I know why TrueNAS *doesn't* do SMART tests on the *physical* disks, but that doesn't mean it can't. It can (should / shouldn't is not relevant here BTW). The only downside is that the physical disks aren't necessarily described in the same way as the controller maps them. It does however list the serial numbers, so I can be sure to remove the right one (although my ILO tells me that anyway)

Code:

root@nas:~ # smartctl -d cciss,2 -a /dev/ciss0
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p6 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST33000650SS
Revision:             XRR6
Compliance:           SPC-4
User Capacity:        3,000,592,982,016 bytes [3.00 TB]
Logical block size:   512 bytes
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c50040bd6fcf
Serial number:        Z291DG3P0000921363F4
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Tue Aug 10 12:36:56 2021 BST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     42 C
Drive Trip Temperature:        68 C

Going back to my original question. Why does TrueNAS think my pool is degraded? I have presented a single logical disk to it. One of the consequences of RAID 5 is that the physical disks are hidden from the disk driver. *This* is what I want to understand. The data is intact, I have verified the last backup from the original data. It's absolutely fine, aside from 6 files that the last scrub reported. As I mentioned, they actually look OK too. Is it assuming it's degraded because the scrub found errors, or is something actually using the ciss driver to interrogate the disks? I have no clue.

This is what will happen to you, as it did to so many others....
Loosing it all to a Raid controler...

Same server, although I have a P222 (which is very different), he has RAID 0+1 and encryption, really not like my config, and his failure was entirely unlike mine. And just as I see here, (Almost) all that happened in that post was bashing the user for their ignorance, rather than helping them deal with the problem.

I will be very happy to listen to anyone who can answer my question.

jlpellet · Aug 10, 2021

Trying to answer the specific question, as a nonexpert, long-time user. As I understand it, if ZFS, in a scrub, detects a file differing from its checksum, in a nonredundant pool/vdev, it reports the pool as degraded. I'm assuming, since it sees the pool as a single drive, it sees an error it cannot correct. I think deleting the listed files then recopying from known good, would clear the error on the next scrub. I think a zpool clear would clear the reported error until the next scrub.

Good luck.

John

jgreco · Aug 10, 2021

DarthMuppet said:
I came here with a very specific question, which nobody has answered. All I've had is 'you're doing it all wrong, stupid'.

Well, yes, because no one here is doing the thing you're doing, which goes against best practices.

I have never lost a RAID5 set in production. I have overseen the deployment of literally thousands. maybe even tens of thousands, mainly before 2010, when we stopped using it.

I'm sure that plays well in your head, but the problem here is that ZFS is both the RAID *and* the filesystem, and it is quite common for RAID5-backed filesystems to become corrupted, often undetected.

What I want to know is whether ZFS is going to get in the way of my RAID controller,

Like it's going to hop out of the CPU and disconnect your RAID card cables? No, of course it isn't, you already know that.

and why TrueNAS thinks the zpool is degraded. if is just because the scrub found errors, or is something else going on?

Presumably because of the reason ZFS always flags "degraded" for a device, it has seen a bunch of checksum errors. That's in the manual. It's warning you that your disk is going bad. In the case of a RAID controller, it means that it has received a lot of sectors back from the underlying storage that contributed to bad block checksums (really the same thing, just not attributable by ZFS to a particular disk).

None of the links anyone has posted have in any way explained why, or applied to my configuration.

Definitely not true. Being resistant to the information is not the same as your not having been provided the information.

DarthMuppet · Aug 10, 2021

jgreco said:
Incorrect history of the CISS driver. CISS was introduced in FreeBSD 4.8R, released April 2003. Allowing some liberty to keep things simple, CISS was an idea that host-side interfacing for RAID controllers could be standardized, and hoping that additional cards would jump on that bandwagon.

I thought it should have been older, looking at the kit it supports, but I didn't look too hard if i'm honest. Way older than I was able to determine, ironically supporting my point, which was that the driver has a lot of hours under its belt too.

Thank you for explaining to us what you think the bulk of the bullet is about. In my opinion, you're wrong. The bulk of the bullet is about "FreeBSD may or may not have good support for other HBA's/RAID controllers." As author of that line, I think I get to define what my bullets are actually about. And these guys here on the forum have similar experiences or have been listening to me kvetch on about this stuff for years, so many of them have a pretty good idea about it too.

You wrote it, you know what it's about, and I would not seek to debate that with you. Ask yourself if it has any relevance to the context in which I am here (why does zfs think my pool is degraded). It doesn't. It tells me why I should not have done what I did, and that is not relevant to my question. I value experience. Experience determined that the best bet is an LSI card, and you conveyed that in the post. Sadly, I have a P222, so I can't (OK, I have a 9211-8i spare now so if things go TU, I will likely use it). pretty sure that the card I am using has nothing to do with my failed disk.

I have spent years working with these things, not necessarily in a ZFS context. I have a bunch of Proliants that have CISS-based controllers in them and were used for years in high traffic contexts. The CISS stuff isn't horrible if you use the controllers for RAID1 or RAID5; they cleanly hide problems with failed sectors and if you can afford a bay for a warm spare, will swap out a failed disk. However, array I/O may still lock up while the controller is waiting for a disk to time out. More or less "the expected stuff."

We have similar experience and expectations. 25 years at a Fortune 100 company, supporting and designing solutions like this. In this context I am a single user NAS serving video and audio to my Pi4 running Kodi and my iPhone. Totally different requirements allow a totally different solution.

Because in the USENET business we normally did redundant servers rather than relying on on-host RAID, and the storage needs were immense, we did NOT use these in RAID1/5, but rather in a JBOD configuration. And there, they sucked. They often hid what the exact failure was behind the scenes, would return zero-filled blocks without reporting an actual error when a sector failed to read, and had a variety of other foibles once things started to go wrong. Foibles that made it hard to spot things going wrong, and hard to identify WHAT was going wrong.

This is what I value. Experience. Honestly it would never have occurred to me to do it that way on a RAID card, but clearly it's been done.

I'll concede the quoted article doesn't specifically discuss your RAID5 case, because the quoted article is primarily trying to scare people off using RAID card JBOD modes, but a lot of the article absolutely does apply to you and you are absolutely wrong.

The article has no relevance to the question I asked. None at all. It just explains why other ways are preferred and I do not disagree.

The undiscussed (i.e. your) problem is that ZFS relies on its redundancy to maintain the integrity of the pool. When ZFS encounters an error, let's say for example a block checksum fails, it rebuilds the block from redundancy, writes it back out, and then gets on with life, happy once again.

Having redundancy with ZFS is pretty much not optional -- it's required. If you do not have redundancy, a corruption introduced into the pool is uncorrectable. ZFS does not have tools like a "fsck" tool, which would require immense amounts of memory and pool I/O to do, so ZFS *relies* on its amazing ability to have a reliable and correct pool.

Once you introduce an error into the pool, that's bad. If it is merely file data, that's "recoverable" in that you can remove the file and the pool itself is not permanently damaged. However, if pool metadata gets hosed, that may well be unfixable, and at that point, your options quickly dwindle down to the dreaded "back up whatever you can and then create a fresh ZFS pool."

I understand that, I am not disputing it, and i have acknowledged throughout that I am not using it optimally. I have full backups and I test them. I believe that this also answers the actual question I asked, and I sincerely appreciate that.

All these people here, myself included, are trying to help you get to a point where you have a more reliable system. Arguing to the contrary is pointless, except insofar as it educates you as to the realities, but that is already largely done in stickies so that you have a high quality answer to refer to.

You have applied real-world experience and explained 'why'. The only place I argued was that your post didn't apply to my situation. Bullet 2 aside, and as I said above, it has absolutely no bearing on the question I asked (i.e. it doesn't in any way tell me why zfs thinks my pool is degraded). If nobody knows, that's fine.

Appreciate you taking the time, thank you.

DarthMuppet · Aug 10, 2021

jgreco said:
I'm sure that plays well in your head, but the problem here is that ZFS is both the RAID *and* the filesystem, and it is quite common for RAID5-backed filesystems to become corrupted, often undetected.

Never deployed zfs on RAID5, should have made that clear. First time i used ZFS was in about 2011 on a Sunfire X4500 connected to another SUN disk array (the model number escapes me) with about 50 300GB SATA drives in it. Awesome machine.

Like it's going to hop out of the CPU and disconnect your RAID card cables? No, of course it isn't, you already know that.

Actually, as I'm not familiar with FreeBSD's architecture, I only have what I would call common sense to assume that it wouldn't. but as you gave the me the *real* answer just below, i'm inclined to go with 'no.

Presumably because of the reason ZFS always flags "degraded" for a device, it has seen a bunch of checksum errors. That's in the manual. It's warning you that your disk is going bad. In the case of a RAID controller, it means that it has received a lot of sectors back from the underlying storage that contributed to bad block checksums (really the same thing, just not attributable by ZFS to a particular disk).

Nailed it - that's exactly what I wanted to know. Thank you.

Definitely not true. Being resistant to the information is not the same as your not having been provided the information.

Definitely true. Your post, even in its excellence, has no relevance to my question about why ZFS says my pool is degraded. It tells me that I would not need to ask the question if I'd followed its advice (again, i don't disagree). Totally not the same thing. You answered my question yourself, and the information you gave me there is not in your article. if it was, I'd agree with you. Let's agree to disagree. And maybe next time i'll RTFM.

Arwen · Aug 10, 2021

I'd like to say something.

Under normal conditions, a hardware RAID-5 LUN with ZFS on top, and a single failed disk in the RAID-5 would almost certainly NOT cause ZFS to find checksum errors, nor degrade the pool, nor declare files bad. (See Note below.)

However, since some hardware RAID controllers do not have bit rot detection, a SECOND, (or even multiple), disk may have lost some blocks. So, on loss of an entire disk, some of the RAID-5 stripes would not be recoverable. Thus, ZFS likely detected the failure as checksum error.

I have PERSONAL experience with this issue. Their were some disks of lesser quality, so I got bad blocks. On multiple disks in the same RAID-5 set. But none overlapped, so no data loss. Except I could not replace any of the disks because then I WOULD have data loss. Had to backup, re-create the RAID-5 set and restore. (This was a LONG time ago, like 2002 time frame.)

To be fair to some hardware RAID controllers, I have seen ones allow scheduling a sanity check of the RAID set. Sometimes this is called patrol read. And some hardware RAID controllers allow it to be done at a lower priority than normal reads. So it should not be impacting.

The problem with this is that unless a disk read fails, verifying the RAID-5 parity does nothing. If the parity is good, no action is needed. If the parity verify is bad, and no disk blocks returned an error on read, their is no way to know WHAT block(s) are bad. Thus, RAID-5 in this scenario won't detect bit rot. Yet RAID-6 would be able to recover.

Note: Their may be one case where ZFS reports an error without data loss on a RAID-5 LUN. If the hardware RAID controller passes read errors up stream so the host server can recognize the problem, but returns good data on a retry. ZFS would likely report a READ error, not CHECKSUM error.

DarthMuppet · Aug 10, 2021

Arwen said:
I'd like to say something.

Under normal conditions, a hardware RAID-5 LUN with ZFS on top, and a single failed disk in the RAID-5 would almost certainly NOT cause ZFS to find checksum errors, nor degrade the pool, nor declare files bad. (See Note below.)

I considered that too, and I came to the same conclusion, purely on the grounds that there's no reason why it should be related. A look at the logs shows that not only did the OS see the failed physical disk (90 minutes after it failed), it correctly identified it as FRU 3. There are some other SCSI errors, all of which were marked as recovered. The last scrub started at midnight 8th Aug, the disk failed 3.14 the next morning, The scrub usually takes about 14 hours, so should be no overlap. The drive is so dead that I can't get to the SMART data on it sadly.

However, since some hardware RAID controllers do not have bit rot detection, a SECOND, (or even multiple), disk may have lost some blocks. So, on loss of an entire disk, some of the RAID-5 stripes would not be recoverable. Thus, ZFS likely detected the failure as checksum error.

All the remaining drives show no errors, and no grown defects, although some hundreds of 'rewrite in place' logs from background media scan. I knew buying SAS disks was a good idea

Note: Their may be one case where ZFS reports an error without data loss on a RAID-5 LUN. If the hardware RAID controller passes read errors up stream so the host server can recognize the problem, but returns good data on a retry. ZFS would likely report a READ error, not CHECKSUM error.

ciss reported the read errors - logs show 29 (recovered) errors over 12 minutes, about 30 minutes after the drive failure, I have no idea when the scrub errors were first reported, but I can't see a reason to link it to the disk failure. One of the allegedly corrupt files is a binary, and it still runs. As the files seem OK, is it possible that the issue here is that the checksums were not right?

Thanks for those insights, very useful.

jgreco · Aug 11, 2021

DarthMuppet said:
Definitely true. Your post, even in its excellence, has no relevance to my question about why ZFS says my pool is degraded. It tells me that I would not need to ask the question if I'd followed its advice (again, i don't disagree). Totally not the same thing. You answered my question yourself, and the information you gave me there is not in your article. if it was, I'd agree with you. Let's agree to disagree. And maybe next time i'll RTFM.

I think the problem here is that no one read your question as "what does degraded mean", because that's defined in the manual and trivially findable as a Google search, and taking it that way would result in a trite answer.

If you re-read your initial post, you don't ask a clear question that is answerable unambiguously, and people are trying to tell you that your problem is that you've built a pool on top of RAID, which is actually a correct diagnosis and reasonable answer. The problem is that you've given ZFS a disk device that is unreliable in a direction ZFS doesn't expect; when an underlying RAID disk device starts throwing errors or timing out, you start getting missing or errant sectors being fed up to ZFS, and that looks like a super-flaky disk device to ZFS (which it is). No one knows exactly what will happen to your RAID5 because it's dependent on factors we cannot know. Presumably it has a good chance of being okay, but part of the reason ZFS was designed is because of RAID5's track history for corruption and bit loss due to the decoupled nature of the redundancy from the filesystem.

More generally --

It's totally reasonable for people to be giving you correct, responsive answers to the issue you were having. Telling them, or me, that our answers have no relevance isn't particularly nice, and (moderator hat on) you are cautioned that you should start from a position of assuming that people are trying to help you solve your problem and end up with a stable ZFS system, and if you are not getting the answers you want, proceed politely with the conversation. People here on the forum will deep dive into topics to nearly ridiculous depths, which may not always result in the answer to your question, but is worthwhile anyways because this is a community discussion forum, and not just an iX tech support forum with paid drones who are expected to provide just the exact answer to the question you meant to ask.

Important Announcement for the TrueNAS Community.

Failed Drive

DarthMuppet

Dabbler

Heracles

Wizard

DarthMuppet

Dabbler

Spearfoot

He of the long foot

ChrisRJ

Wizard

DarthMuppet

Dabbler

Heracles

Wizard

Spearfoot

He of the long foot

jgreco

Resident Grinch

DarthMuppet

Dabbler

jlpellet

Patron

jgreco

Resident Grinch

DarthMuppet

Dabbler

DarthMuppet

Dabbler

Arwen

MVP

DarthMuppet

Dabbler

jgreco

Resident Grinch

Similar threads

Important Announcement for the TrueNAS Community.

Failed Drive

Dabbler

Wizard

Dabbler

He of the long foot

Wizard

Dabbler

Wizard

He of the long foot

Resident Grinch

Dabbler

Patron

Resident Grinch

Dabbler

Dabbler

MVP

Dabbler

Resident Grinch

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Failed Drive"

Similar threads