Replacing drive, scan/read normal but issue/write slow - SED drive issue?

jhl

Dabbler
Joined
Mar 5, 2023
Messages
27
Truenas CORE 13.0-U3.1
Current pool:
2x 256GB NVMe metadata special device mirror
3x 4TB WD blue in RAID-Z1
51% of ~7.5TiB capacity used

All disks are healthy but I want to replace with SAS drives so I can move the WD Blues to a computer that doesn't support SAS.

I have Dell branded Seagate 4TB SAS self-encrypting drives. I had a hard time making them work, I think related to the SED feature and Truenas Scale, but on CORE at least one seems to be working fine now. In its own single-disk pool I can write data at 150MB/s, scrub without errors, and read data back successfully. Waiting on the results of a long SMART test but I have found no reason yet to think this drive is faulty.

However when I try to Replace one of my existing disks, I have 2 problems:
* Even if I just wiped the drive from the Disk menu in the GUI, it says partitions were found and I need to Force use of the disk as a replacement.
* Resilvering is insanely slow. I've seen posts with "slow" 30MB/s resilvering speeds but their Zpool Status readouts look symmetrical between scanning and issuing- X GB scanned at 30MB/s, X GB issued at 30MB/s. Mine looks like a massive bottleneck writing to the drive. After running overnight for 12 hours it reported roughly 360GB scanned at 55MB/s, 12GB issued at 3.5MB/s. ETA went up to a few weeks before it stopped giving an ETA. Overall resilver progress after 12 hours was 2% at which point I stopped. I don't have enough RAM to read 360GB but not write it out anywhere - in the context of replacing a helathy drive, what is the resilver doing if not writing to the SAS drive?

Basically I am wondering if this is an incompatibility with these weird SAS drives, or a mixed pool of SATA/SAS drives, or if my expectations are just way off for how long this could take. If it is normal that it would take 25 days to replace each of these drives, I'll need to figure out another way to migrate to the SAS drives (likely export this pool to another machine and move the data back over to a new SAS pool).

Thanks in advance for any help!
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
* Even if I just wiped the drive from the Disk menu in the GUI, it says partitions were found and I need to Force use of the disk as a replacement.
Fine as long as you're confident you're identifying the right disk.

* Resilvering is insanely slow. I've seen posts with "slow" 30MB/s resilvering speeds but their Zpool Status readouts look symmetrical between scanning and issuing- X GB scanned at 30MB/s, X GB issued at 30MB/s. Mine looks like a massive bottleneck writing to the drive. After running overnight for 12 hours it reported roughly 360GB scanned at 55MB/s, 12GB issued at 3.5MB/s. ETA went up to a few weeks before it stopped giving an ETA. Overall resilver progress after 12 hours was 2% at which point I stopped. I don't have enough RAM to read 360GB but not write it out anywhere - in the context of replacing a helathy drive, what is the resilver doing if not writing to the SAS drive?
I don't know what model exactly that we're talking about here, but I see some Dell branded 4TB disks showing up as constellation ES3... which translates perhaps to a model number: ST4000NM0023... seems to not be on the SMR drives list (https://www.truenas.com/community/resources/list-of-known-smr-drives.141/) even if the symptoms you see look a lot like SMR.

Depending on how your disks are attached, maybe there's something going on which is making your controller struggle with the mix of SAS/SATA, but it should normally not be an issue to mix them as I understand it.
 

jhl

Dabbler
Joined
Mar 5, 2023
Messages
27
Thank you!

I didn't even think to check whether this drive is SMR because it's smaller and older. ST4000NM0135 - can't find a datasheet on it but at least one site selling them claims they are CMR.

I think that disk might have something wrong with it. It actually passed a long SMART test after I made my post. But another disk from the batch has slightly better performance on write tests and appears to have slightly better performance resilvering even though it's too early to tell for sure.
 

jhl

Dabbler
Joined
Mar 5, 2023
Messages
27
Well, I found a way to check if my drive is working normally, I think?

Resilvering with the other drive is going about twice as fast but still very slow. When I check Disk Operations in the Reporting tab the drive appears to be processing 115 write operations per second. Even though it's only writing about 1.43MB/s at this point, 115 IOPS sounds about right for a spinning hard disk, doesn't it?

So for some reason my dataset is very slow to resilver with lots of small operations but the disk is handling many small operations about as well as you'd expect it to. If I'm understanding the numbers right.
 
Top