FreeNAS 9.3 ZFS State UNKNOWN

strobes

Cadet
Joined
Feb 9, 2022
Messages
8
Hi,
as you can see the system was configured around 2018 and was very stable until one of the drives in hardware RAID 5 failed.
So the drive was replaced and RAID was rebuild by Raid controller, no more alarms. But the /mnt/ftphome is lost.
I'm new to the FreeNAS/TrueNAS and want to be cautious not to loose data if it still there.
Please let me know if additional information is required. And in general can you point me to good source of information how to deal with this problem?
Thank you.
error.JPG
status.JPG
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399

I'm sorry, there's no recovery in this case, unless you have offboard backups.
 

strobes

Cadet
Joined
Feb 9, 2022
Messages
8
Hi,
if hardware RAID 5 lost one drive, dedicated hardware raid controller can rebuild all the data from other two drives. And this completed without issue based on Raid controller logs. Can you please give me an idea why in you opinion data is lost?
 

strobes

Cadet
Joined
Feb 9, 2022
Messages
8
Please help me to understand better to avoid problem later.
In this case after drive failure, and replacement, the HW Raid rebuild all the data in RAID 5. And of course RAID 5 is presented to OS as virtual drive that appear to be on line with data intact. Here is the zpool output. ( i need to take a look how may drives actually connected to the HW raid. may be it had a spare too. )
So if data on the virtual drive present, why FreeNAS can't see it?
zpool.JPG
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Because rather than giving ZFS access to the redundancy information and allowing ZFS to lovingly caress the sectors on the disks with care and checksums and fault tolerance, you instead gave the disks to a crappy RAID controller, which not giving a damn about the correctness of the data it was "rebuilding", so it did so without knowing whether what it was "rebuilding" was actually correct, or contained damaged sectors and read errors. That hides all the redundancy from ZFS, so now ZFS cannot really help you.
 

strobes

Cadet
Joined
Feb 9, 2022
Messages
8
hmm, this is bold statement to call LSI MegaRAID SAS 9260-8i a crappy RAID controller. The system is Intel S2600GZ. I don't want to go in to discussion that ZFS/FreeNAS would do better job of preserving data compare to LSI/Broadcom/Avgo/Intel and the code used in these controllers serving in every industry you can think of. If FreeNAS cant play nicely with external raid controllers that is a different story.
Now I'm curious if this is actually was published back in 2018.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Fine. Obviously I have no clue what I'm talking about, have zero experience with the awesome LSI RAID controllers, and your pool is in a happy state and your data is safe and secure.

@Samuel Tai already gave you a link to a resource that explains some of the issues here. The guy who wrote it only looks like me.

Or this article that's nearing a decade old.


I don't want to go in to discussion that ZFS/FreeNAS would do better job of preserving data compare to LSI/Broadcom/Avgo/Intel and the code used in these controllers serving in every industry you can think of.

I don't want to go into such a discussion either. Were you interested in learning, I will often go into deep conversations about this, but really, message #6 above concisely summarizes the issue. If you want some deeper explanation, feel free to check out


and then scroll down to where it says

Hardware RAID controllers should not be used with ZFS.

and then start reading from there. So your opinion is noted, but is at odds with both the experiences of people here, including me (I do this professionally), and the OpenZFS folks too.
 

strobes

Cadet
Joined
Feb 9, 2022
Messages
8
To the guy that only looks like you: no offense, I'm in position to learn as the issue in hand needs to be fixed and system upgraded to TrueNAS. All the info provided is greatly appreciated.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
So your takeaway REALLY needs to be that the RAID thing is super-bad if it compromises redundancy.

ZFS *needs* redundancy. If a block is in error, it needs to be able to read from redundancy and will then correct the error on the errant disk. ZFS virtually *requires* redundancy or you are looking at eventual data loss of some sort because storage devices develop faults over time. Due to the sheer size and complexity of pools that can range into the petabytes, there are no fsck or chkdsk tools. ZFS needs to be able to maintain the integrity of the pool on an ongoing basis.

If you have a situation such as a single HDD where there is no redundancy, and a sector fails to read or gets corrupted, that data is lost. ZFS notices these because it checksums all data.

If you have two HDD's in a mirror, ZFS will read the "bad" sector from the redundant source and write it back to the first drive, healing the error.

Your RAID5 is a problem though. To ZFS, it looks like a single HDD. It has nowhere to pull redundancy from. So if there is a sector that fails to read (and yes your RAID5 protects you from THAT) or gets corrupted (the RAID5 cannot save you from that), then that block is lost.

The problem is, when a RAID5 controller is rebuilding an array, let's concede that it is largely successful in a large number of cases, but, a corrupted sector or read error encountered during rebuild is going to result in incorrect data. ZFS will NOTICE that but can't really do anything to correct it.

Now, worse, a lot of RAID controllers are just not designed for the massive I/O loads ZFS tends to throw at it, so there are other edge cases where crap gets written out to the RAID5 volume, potentially causing other problems. @Samuel Tai provided a link explaining some of this already. So while in theory you could mirror two RAID5 volumes as a ZFS mirror vdev, and that would provide redundancy, it still isn't a good idea, because the experiences of others indicates that this tends to fail in practice.

Related reading:

 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
@strobes - I have been in a situation that a 4 disk hardware disk array RAID-5 set lost a disk, but had a bad block on another disk that it did not know about. So, during rebuild, the array failed. Total file system loss, and restore from backups.

Newer hardware RAID-5/6 controllers, (including an update to the firmware of the example I used), include a patrol read. (Different vendors may call it something else.) That's similar to ZFS scrub, and in theory would have prevented the problem I experienced.

But, do I want to trust that their firmware is "perfect"?
And activate patrol reads by default?
That they also check the RAID-5/6 parity against the data stripe during a patrol read?

For the big boys, EMC, Hitachi, etc... yes, I'd trust their SAN storage, (with underlying RAID-1/5/6 whatever).

So, @jgreco's comments are spot on, from what I know...
Your RAID5 is a problem though. To ZFS, it looks like a single HDD. It has nowhere to pull redundancy from. So if there is a sector that fails to read (and yes your RAID5 protects you from THAT) or gets corrupted (the RAID5 cannot save you from that), then that block is lost.

PS: I define one main criteria of storage firmware as "perfect" if I experience no data loss. Irregular performance, I may dis-like or outright hate, but data loss due to preventable means, no thank you. (I am looking at you, Western Digital, on you new Red SMR disks!)
 

strobes

Cadet
Joined
Feb 9, 2022
Messages
8
The guy who build FreeNas back in 2018, looked like me, is old school with idea in mind data redundancy is better to be offloaded from host to reduce associated CPU, IO. ( and back up is critical part of any system), but overlooked HW Raid-ZFS situation.
Thanks.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
@strobes - Understood.

One other point. Today's CPUs are 10 times faster than many PCIe based HW RAID cards. Most HW RAID cards would not likely have more than 2 cores, and generally less than 1Ghz. Plus, limited memory, sometimes DDR3, (or worse, DDR2).

So having ZFS do compression, checksums and potentially encryption along with the redundancy is trivial compared to most lower end HW RAID cards that do only redundancy.

Sun Microsystems, (the inventor of ZFS), even found that with compressible data, I/O was reduced because less data had to be written or read from main storage. (Back then, main storage was still exclusively slower spinning disks.) Meaning it was faster to read compressed data, and un-compress it, then to read the un-compressed data instead. Less appropriate today with SSDs that don't have long access / seek times.
 
Top