Scrub didn't clear the error?

Samuel Tai · Jan 14, 2021

zpool scrub {PoolName}

ISJ · Jan 14, 2021

Thanks for all the help @Chris Moore and @Samuel Tai.

Samuel Tai · Jan 14, 2021

You're welcome.

UltrashRicco · May 9, 2023

Samuel Tai said:
OK, if you've already deleted the file, try rescrubbing the pool again. If you have snapshots that refer to the deleted file, you should delete those too. Once there are no more references to the corrupt file, scrub your pool again. It may take multiple scrubs before ZFS is happy again.

This... thank you! Let me explain what happenned to me:

I sufferred a power loss during a drive replacement (resilvering of the new drive), and consequently got checksum errors on all the 4 drives of my RAIDZ2 pool, relative to only ONE file.

After deleting the file, running "zpool clear pool" and scrubbing the pool, I still got checksum errors relative to the first daily snapshot after the incident happenned.

I deleted the snapshop, cleared the errors and scrubbed, then got more checksum errors related to the next snapshot...

So I gave it some thought and deleted all subsequent snapshots until original problematic file deletion, then performed a "zpool clear pool", and performed another scrub.

I did not understand why, but I still got an error showing, with the pool and filename showing inside brackets, like <pool>:<filename> (meaning the errors relate to a deleted file).

I was becoming a bit desperate and was considering destroying my datasets/pools to recreate them from scratch... fortunately, with a bit of searching, I found this thread explaining that several successive scrubs are necessary to eventually clear the error for good, see screenshot in attachment.

So, thank you all very much for the explanations! I hope this can help someone else facing the same situation!
Regards,

UltrashRicco · May 9, 2023

UltrashRicco said:
This... thank you! Let me explain what happenned to me:

I sufferred a power loss during a drive replacement (resilvering of the new drive), and consequently got checksum errors on all the 4 drives of my RAIDZ2 pool, relative to only ONE file.

After deleting the file, running "zpool clear pool" and scrubbing the pool, I still got checksum errors relative to the first daily snapshot after the incident happenned.

I deleted the snapshop, cleared the errors and scrubbed, then got more checksum errors related to the next snapshot...

So I gave it some thought and deleted all subsequent snapshots until original problematic file deletion, then performed a "zpool clear pool", and performed another scrub.

I did not understand why, but I still got an error showing, with the pool and filename showing inside brackets, like <pool>:<filename> (meaning the errors relate to a deleted file).

I was becoming a bit desperate and was considering destroying my datasets/pools to recreate them from scratch... fortunately, with a bit of searching, I found this thread explaining that several successive scrubs are necessary to eventually clear the error for good, see screenshot in attachment.

So, thank you all very much for the explanations! I hope this can help someone else facing the same situation!
Regards,

I forgot to mention I am using TrueNAS Scale 22.12.2, so this is still pretty much valid, most likely for any ZFS system actually.

jgreco · May 9, 2023

UltrashRicco said:
I forgot to mention I am using TrueNAS Scale 22.12.2, so this is still pretty much valid, most likely for any ZFS system actually.

This sort of error often seems to pop up when you're using a bad disk controller. How are your drives attached to the NAS system? What specific controller? SATA AHCI? LSI HBA? RAID?

UltrashRicco · May 9, 2023

jgreco said:
This sort of error often seems to pop up when you're using a bad disk controller. How are your drives attached to the NAS system? What specific controller? SATA AHCI? LSI HBA? RAID?

My hardware is my former desktop PC with the following hardware:
Asus ROG Strix Z270E motherboard with 6 onboard SATA ports in AHCI mode and 2 NVMe ports.
Intel Core i7 6700K
32 GB of DDR4 RAM, non-ECC (not supported by the processor or motherboard)
boot pool: 256GB SATA SSD
data pool:
4 x 2TB Seagate Ironwolf drives in RAIDZ2
1 x 2TB Seagate Ironwolf drive as a spare
2 x 256GB mirrorred NVMe drives as Log

I could directly trace the issue back to a loss of power during a resilver while I was attempting to replace a disk. I had sufferred a disk failure and was returning the spare disk to spare, as I received a replacement drive (recertified Seagate drive as a replacement for the failed drive).
After the loss of power, I was unable to boot the server, as I was getting Grub errors (Grub alloc magic is broken).
I installed Truenas scale again (I did not need to but I did not know that) but it did not help.
In order to get the server to boot, I had to disconnect all disks, boot the server, hot-connect the disks and import the existing pool again, which resumed the disk replacement operation.
After that, the pool started resilvering automatically.
Once done resilvering, everything worked fine except for some checksum errors tracing back to only one file.
I had to delete the file and all snapshots, then run a "zpool clear", then scrub twice in order to solve everything.

@jgreco This is my first homemade home NAS, so not a critical application, and I do backups of my important data. It had been running fine since early January.
I do know this is not ideal bombproof recommended reliable hardware, and I WILL definitely consider a LSI HBA card in IT mode for subsequent needs. Oh and probably an UPS too... :)

jgreco · May 10, 2023

UltrashRicco said:
I do know this is not ideal bombproof recommended reliable hardware, and I WILL definitely consider a LSI HBA card in IT mode for subsequent needs.

Well, even in the realm of suboptimal hardware, there can be better choices and less-good choices. So while I'm not seeing anything obvious that would have led to your pool issues, I do want to comment:

UltrashRicco said:
WILL definitely consider a LSI HBA

An LSI HBA is probably going to do nothing for you as you seem to have sufficient AHCI ports. AHCI ports can typically run at full speed, while early HBA's such as the LSI 2008 may only run at a fraction (think: maybe 80%) of the speed at least if all ports are busy.

UltrashRicco said:
2 x 256GB mirrorred NVMe drives as Log

I notice that your board has two M.2 NVMe slots. Also, 256GB is a common consumer SSD size. Since you say these are for "Log", I'm guessing you mean SLOG, and if so, two notes --

1) You don't need to mirror SLOG.

2) A SLOG device really needs power loss protection, or some similar feature such as Optane's cacheless write, or else the SLOG does not serve its intended function correctly. You will just be burning through the endurance on your SSD's.

UltrashRicco · May 10, 2023

jgreco said:
Well, even in the realm of suboptimal hardware, there can be better choices and less-good choices. So while I'm not seeing anything obvious that would have led to your pool issues, I do want to comment:

An LSI HBA is probably going to do nothing for you as you seem to have sufficient AHCI ports. AHCI ports can typically run at full speed, while early HBA's such as the LSI 2008 may only run at a fraction (think: maybe 80%) of the speed at least if all ports are busy.

I notice that your board has two M.2 NVMe slots. Also, 256GB is a common consumer SSD size. Since you say these are for "Log", I'm guessing you mean SLOG, and if so, two notes --

1) You don't need to mirror SLOG.

2) A SLOG device really needs power loss protection, or some similar feature such as Optane's cacheless write, or else the SLOG does not serve its intended function correctly. You will just be burning through the endurance on your SSD's.

Thank you for the advice!

Honestly, I put this server together by merely purchasing hard drives, as I had the rest laying around, without doing too much research. I had a prior successful experience with a Raspberry Pi and a couple of external USB drives, running Open Media Vault, and I wanted to go with something a little more robust and powerful, without paying the premium for a commercial NAS enclosure (which do not have much processing power anyway).

I did some research while repurposing my old PC (and since), and I leant about LSI HBAs. I would consider one if I ever need to extend my setup with more drives, as I have now ran out of ports. What I did not know was that LSI HBAs could run slower in certain circumstances.

Yes I meant SLOG, this is confusing as Truenas Scale labels it as log, and does label L2ARC as cache too.
I was initially using one of the NVMe SSDs as a L2ARC device and the other as SLOG, but I realized that the hit ratio on my ARC was almost everytime 100% (with 32GB of RAM). As many people new to ZFS, I though that "cache" (L2ARC) would speed up read speeds, when in reality it would be counterproductive in most home user scenarios (L2ARC cannot beat RAM speed, and L2ARC does consume a little bit of RAM too).
So I repurposed this SSD to complement the exisiting SLOG SSD, and I made a mirror with both drives. I doubt they have power loss protection. I understand this is not ideal for SLOG. Should I get rid of a dedicated SLOG drive altogether?

I might as well only use one NVMe SSD as SLOG and use the other one to host the boot-pool, which would free one of the SATA ports.

Thanks again for your constructive help and advice, and for contributing to the community for such a long time! :)

jgreco · May 10, 2023

UltrashRicco said:
What I did not know was that LSI HBAs could run slower in certain circumstances.

They are basically a PowerPC CPU that has a protocol that talks to the host PC, and then more or less proxies stuff out to the SAS drives/expanders/whatever you have. Because the early ones for 6Gbps SAS were actually designed for hard drives, and because they served as "budget RAID controllers" as well as HBA's, so if you have all HDD, you're normally going to be okay, especially if you only have 8 HDD's and you are not doing 100% sequential reads. But once we moved on to SSD's, that could peak out at 6Gbps rather than the 2.5-3Gbps of a HDD, and if you have a larger array behind an SAS expander, the 2008 has the potential to be underpowered, especially if all ports are running full speed.

The 2008 is a PowerPC 533 MHz CPU, the 2308 is an 800 MHz, and the 3008 is a 1.2GHz. Even the 3008 is a bit slower than SATA AHCI, because there's some overhead passing commands around from one CPU to the other. But they're all great cards. Don't let me scare you into buying something you don't need. 2008's ran my storage pools for many years, and the only reason I moved on to the 3008 was a VMware compatibility issue.

UltrashRicco said:
As many people new to ZFS, I though that "cache" (L2ARC) would speed up read speeds, when in reality it would be counterproductive in most home user scenarios (L2ARC cannot beat RAM speed, and L2ARC does consume a little bit of RAM too).

Very nice that you did the homework. Saves a bunch of typing.

UltrashRicco said:
So I repurposed this SSD to complement the exisiting SLOG SSD, and I made a mirror with both drives. I doubt they have power loss protection. I understand this is not ideal for SLOG. Should I get rid of a dedicated SLOG drive altogether?

The purpose of the SLOG is to ensure that transactions get committed to the pool. This is done REGARDLESS, but may be slow in some use cases, such as where you are asking for sync writes. If you are not using the filer to service databases, block storage for NFS or iSCSI, or insane amounts of metadata updates, it is quite likely that your ZIL requirements are minimal, and it may be better just not to worry about it. I don't have the time for a deep dive right now but I'll suggest you eyeball the following:

Some insights into SLOG/ZIL with ZFS on FreeNAS

What is the ZIL? POSIX provides a facility for the system or an application to make sure that data requested to be written is actually committed to stable storage: a synchronous write request. Upon completion of a sync write request, the underlying filesystem is supposed to guarantee that a...

www.truenas.com

UltrashRicco said:
Thanks again for your constructive help and advice, and for contributing to the community for such a long time! :)

It's always a pleasure talking to other technically minded people, especially if I can help ease them into ZFS, which is like trying to scale a cliff.

UltrashRicco · May 10, 2023

Thanks for the details about the different existing LSI HBAs. I'll make sure to dive into the specifics if/when I decide I need one.

I already read about SLOG/ZIL before, and I thought I understood that the faster the drive (SSD VS HDD), the fewer risks of data loss, and that a SLOG on a SSD could be a good idea. I did overlook the power loss protection part.
I am now reading your article (takes me a while to fully understand, as I am no native English speaker), and I believe I need to do quite a bit more research to help me weigh the pros and cons of using a NVMe SSD with no power loss protection as a dedicated SLOG drive... :)
My server is merely a home media server, that also stores a copy of all my important data and backups of my home PCs. No intensive I/Os in play.

Thanks again!! :)

Important Announcement for the TrueNAS Community.

Scrub didn't clear the error?

Samuel Tai

Never underestimate your own stupidity

ISJ

Dabbler

Samuel Tai

Never underestimate your own stupidity

UltrashRicco

Cadet

Attachments

UltrashRicco

Cadet

jgreco

Resident Grinch

UltrashRicco

Cadet

jgreco

Resident Grinch

UltrashRicco

Cadet

jgreco

Resident Grinch

Some insights into SLOG/ZIL with ZFS on FreeNAS

UltrashRicco

Cadet

Similar threads

Important Announcement for the TrueNAS Community.

Scrub didn't clear the error?

Never underestimate your own stupidity

Dabbler

Never underestimate your own stupidity

Cadet

Attachments

Cadet

Resident Grinch

Cadet

Resident Grinch

Cadet

Resident Grinch

Cadet

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Scrub didn't clear the error?"

Similar threads