Pool degraded

Stef_go

Cadet
Joined
May 21, 2023
Messages
3
Hi everyone,

I have a degraded pool problem on my fresh TrueNAS SCALE build.
Here is the configuration :

Motherboard : Asrock H510M-ITX/ac Intel H510 LGA 1200 Mini ITX
CPU : Intel Pentium GOLD 6400
RAM : 32Go
Boot drive : Kingston NV2 NVMe PCIe 4.0 SSD 250Gb
Performance pool (mirror) connected to a LSI 9211-8i, FW:P20 :
- Samsung EVO 850 SSD 250Gb
- KINGSTON SA400S3 240Gb
DataPool (Raid Z1) connected to the onboard sata:
- 4x Western Digital WD40EFAX-68J 4Tb RED Drives
PSU : CORSAIR 550w

Everything is ok, but as soon as I copy data to the DataPool I get the message "Degraded Pool".
Here is the result of zpool status :
Capture d’écran 2023-05-19 à 09.25.58.png

Strange thing is that the "error" is not always located on the same drive.
A pool clear fixes the problem for a few seconds, then it reappears.

I've tried many things, and checked a lot.
- memtestx86 ran for 5 hours without problem
- SMART tests (long, short, offline) are all ok
- I disconnected the DataPool from the motherboard and plugged it on the LSI HBA. Same issue.
- Checked for BIOS and firmware update for motherboard and WD hard drives : already on the latest versions.

Since it is not in production, I tried switching to openmediavault, just to see. No problem of any kind.
Went back to TrueNAS Scale : problem came back as I wrote datas to the pool.
Decided to push forward the testing by switching to TrueNAS CORE and a miracle happened:
NO PROBLEM AT ALL.
But I need the apps/docker features from TrueNAS SCALE.. :(

My brain is melting right now... what am I missing ?
Any advices folks ?

Thanks !
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994

Stef_go

Cadet
Joined
May 21, 2023
Messages
3
Thank you for your reply, and for the link.
It is very interesting, and quite odd as I have absolutely no errors on CORE..
I guess there is something different in the ZFS implementation there...
So now, I'm wondering what are my options to be able to use SCALE, as I can't afford replacing all four drives..
TrueNAS VM on Proxmox ?
 

Stef_go

Cadet
Joined
May 21, 2023
Messages
3
No, there isn't.
So how come I have no problem then ? Don't get me wrong, it is just pure technical curiosity
And also it means that maybe I can have problems sooner or later..

Generally a bad idea if you can avoid it. How about a Linux VM on CORE to run your containers?
I think too, it looks like too many virtualization layers to be effective and reliable..

I guess I'll stick with CORE and go with the linux VM.. or maybe LXC on my Proxmox server.
Sounds funny :)

Anyway, thank you guys for your help and your advices :)
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
So how come I have no problem then ?
Different OS/driver code dealing differently with disk bugs, either explicitly avoiding them or being lucky and not running into them. Happens occasionally. It could be something as simple as a timeout for writes being set to a different value.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
So how come I have no problem then ? Don't get me wrong, it is just pure technical curiosity

You want my opinion? "Cuz LinSux."

Both FreeBSD and Linux use the same code base. However, FreeBSD has an extra decade of integration effort behind it. Linux people hate it when I say that ZFS on FreeBSD is a higher quality product. This is just opinion, but it seems to be backed up by observations. Plus all the stupid LinSuxisms like ARC only using half the system memory.

The problem you're having isn't with ZFS in any case. Well, doubtful, doubt it's with ZFS. It's that the OS is reporting errors of some sort to ZFS, or possibly corrupting data. The underpinnings of the OS need to be solid, including 100.0000% device drivers for the silicon you're using. People are always going on about how "drivers on FreeBSD suck compared to Linux" but that isn't necessarily true. Drivers on Linux try harder to make sucky hardware work, and maybe that's okay if you're a desktop web browsing and OpenOffice workstation. It isn't good for ZFS though.

That's my technical opinion and I hope you don't mind that it is mildly vague, since I don't actually know what's causing your issue.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Plus all the stupid LinSuxisms like ARC only using half the system memory.
Thanks to weird Linux kernel nonsense, memory management for ARC requires a byzantine setup whose side-effects seem to include the kernel not knowing when ZFS has released ARC, until things get desperate. Best case, the ARC gets emptied by the kernel repeatedly asking for memory back and then not realizing that it is back; worst case memory allocations start falling randomly, despite memory being available, until the kernel figures out that it had a truckload of memory that was freed.
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
However, FreeBSD has an extra decade of integration effort behind it. Linux people hate it when I say that ZFS on FreeBSD is a higher quality product. This is just opinion, but it seems to be backed up by observations. Plus all the stupid LinSuxisms like ARC only using half the system memory.
ZFS on Linux is likely never going to be of the same quality as on FreeBSD because of license incompatibilities. Linus Torvalds himself states that it will probably never be supported directly on the kernel source tree unless he gets a direct letter signed by Oracle CEO that it's permissible. He's also on the record saying this:
"Don't use ZFS," Torvalds wrote. "It's that simple. It was always more of a buzzword than anything else, I feel, and the licensing issues just make it a non-starter for me. The benchmarks I've seen do not make ZFS look all that great."
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
It was always more of a buzzword than anything else, I feel, and the licensing issues just make it a non-starter for me.

We feel the same way about your PoS, Linus.

The benchmarks I've seen do not make ZFS look all that great.

Do they even have anything over there that can scale into the multipetabyte region? Or is he just talking out his rear as usual?
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
We feel the same way about your PoS, Linus.



Do they even have anything over there that can scale into the multipetabyte region? Or is he just talking out his rear as usual?

This thread is conflating multiple different issues:
  1. SMR drives are not good with ZFS. In particular, WRITES can be slow which probably causes the errors that started this thread.
  2. TrueNAS is not tested/supported with SMR drives.. so faults can happen. Timeouts may be different for Linux and FreeBSD, but neither is expected to work reliably.
  3. ZFS and Linux don't share the same GPLv3 license - That's annoying for Linus and it won't become native in Linux, but not a major issue for TrueNAS. Btrfs is not a reasonable substitute. Other storage-class file systems are proprietary.
  4. TrueNAS SCALE (linux) is not yet as reliable as TrueNAS CORE (freebsd) - True for now, but with testing, the reliability and performance gap is closing.
General reliability of SCALE (Linux) ZFS is very good. There are remaining issues for ARC efficiency and scalability beyond 5PB.
On Linux, NFS is probably better, SMB is equivalent, iSCSI is not as performant yet.

So Linus may not be a storage expert, but he has no good GPL storage choices. TrueNAS does have good storage choices and it can be run on Linux.

Time and effort can solve a lot of problems...
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
So now, I'm wondering what are my options to be able to use SCALE, as I can't afford replacing all four drives..
If you must use those drives then in my opinion, you will need to find some other storage program that supports SMR drives without issue. There are other options, even a Windoze Server is an option that might work well with those drives. Again, just my opinion.

One option for you, if you just purchased these drives, see if you can return them for WD Red Plus (CMR) model and tell them that the SMR drive does not work in the NAS. WD recognizes this as well hence the "Plus" line now. Always verify the drive is not an SMR drive.

Best of luck to you.
 
Top