Unusual activity on a 10 drives on a raidz3

brunomsqt

Cadet
Joined
Aug 13, 2023
Messages
5
Hi
First and foremost, thank you for creating and keeping such a large community.
I've ventured into TrueNAS Core after having an NAS built with CentOS. It's a 10x8Tb on Z3. Other specs are a humble i3 6100 and 32gb RAM.
I've taken my time to read, configure and I believe I have everything up and running!

But...

As I check disk activity with gstat -dp, i have two hdds that have a %busy a lot higher than everything else. As soon as i move files into the NAS, the rarely drop below 80%, while everything else is, at most, 20% something.
If no data is read/write, they both drop to zero alongside everybody else.
Doesn't matter if it is local, through samba or sftp.

Replaced both drives, same behaviour. Tested every drive with SMART long test, all passed.

Can anyone hint me to why this happens? And specifically those two HDDs?
 

c77dk

Patron
Joined
Nov 27, 2019
Messages
468
how are the drives connected? (controller/backplane - the same as the rest of drives?) and which models are they?
 

brunomsqt

Cadet
Joined
Aug 13, 2023
Messages
5
They are all scattered through 4 PCI-E 1x Sata Expansion Cards. I've tried swapping them around, different expansion cards, always the same behaviour.
The models are 6xWDC WD80EDAZ and 4xST8000DM004.

The first time I noticed it it was 2xSeagate. I removed them, threw in a couple WD, resilvered but, after that, same behaviour, always ADA6 and ADA8.

Apart from that, everything seems to be working fine...
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
Please have a read of the forum rules.

note that SATA expansion cards are typically iffy. you would likely be much better off getting a supported HBA. 1 single HBA would probably have cost only about the same as 4 SATA PCIe cards but be more reliable than any of them. 2008 LSI 9201/9211 type cards are extremely well supported in truenas and can generally be found for decent prices. one card will run 8 drives, or up to 256 with expanders.

as you havent posted your motherboard, I'm going to guess it's also something unsupported, and greatly suspect that it's the source of the interesting behaviour you are seeing.
 

brunomsqt

Cadet
Joined
Aug 13, 2023
Messages
5
I see. As I said before, was running the same hardware with CentOS without issues, didn't cross my mind TrueNAS would have a specific component list. As I first installed and everything seemed to work out of the box, i just kept going and reading from that point on.
I'm currently using a z170 (not using any raid) board with a intel i219 lan.

So, you're telling me I sould have gone with a couple of these with some 8087? https://aliexpress.com/item/1005005028899772.html
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
@brunomsqt , I would not buy an HBA from AliExpress, simply too big a chance to get a fake one.

The board looks good. Please note that Supermicro makes quite a few boards with 10 or more SATA/SAS connectors. With such a board you wouldn't need a separate HBA, which also needs cooling and adds to the electricity bill.

As to things seeming to work under CentOS: Think of it along the lines of ZFS looking much deeper into things (by means of checksumming). It is a bit like in hospital where an MRT shows things you cannot see on an x-ray.
 

brunomsqt

Cadet
Joined
Aug 13, 2023
Messages
5
Thank you for all of your time. I'll look into swapping the board and HBA (if needed). I realize now that I started out quite uninformed and need to redo things.

Thanks again.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
They are all scattered through 4 PCI-E 1x Sata Expansion Cards.


So, you're telling me I sould have gone with a couple of these with some 8087? https://aliexpress.com/item/1005005028899772.html

Probably not, the AliExpress pictures scream "knockoff".


It's possible that they work just fine, and if so, well, that's fine. But it's also possible they don't, in which case you're hung dealing with trying to ship the thing back to China. There are reputable vendors in the US and EU that you should consider instead. Probably at a somewhat higher cost.
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
Probably not, the AliExpress pictures scream "knockoff".
yup, those are a hard pass. they didn't even bother to at least fake the LSI logo.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
yup, those are a hard pass. they didn't even bother to at least fake the LSI logo.

Well, there's a certain sort of honesty to be had there. I am a lot more worried about cards that pretend to be something that they're not. There are manufacturers like Supermicro, Silicom, or 10GTek who buy rights and proudly brand their name on a card. It's like walking into a grocery store for a can of peas. You might find "Green Giant" brand at a premium price, but you might also find "Kroger" brand. Either one should contain peas and be safe to eat. The Green Giant ones may be better, opinions vary. But I wouldn't trust walking in to a dollar store and getting a can of peas with no label where someone has written "PEEZ" in Sharpie on the can.

Properly applying the logic that you use in everyday life to your tech purchases is a good idea.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
It's possible that they work just fine, and if so, well, that's fine. But it's also possible they don't, in which case you're hung dealing with trying to ship the thing back to China.
While it is implicitly said, I would like to point out that getting your money back may be the least of your problems. What if the card sort-of worked initially but corrupted data that cannot be retrieved from a backup?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
What if the card sort-of worked initially but corrupted data that cannot be retrieved from a backup?

If you want to venture down that rabbit hole, it is worth noting that the LSI's have a tragic history of corrupting data when they (for example) are overheating, which can result in garbage being written to all the disks of your pool. ZFS does not verify on write, and it is a problem if data is miswritten to disk, especially if that were to happen at an incorrect LBA (potentially overwriting older previously written data that had been correctly written). I have long suspected that this may be part of the CKSUM errors seen in scrubs after an overheated controller issue has been remediated by users who have installed hot-running LSI HBA's in non-rackmount PC's. Just a grinchly theory as I don't have a way to prove it.
 
Top