Make kernel constants such as BLK_MIN_SG_TIMEOUT configurable for slow disks?

Hoozemans

Cadet
Joined
May 25, 2023
Messages
6
Good morning!

I have a bunch of SMR disks lying around, and I figured I'd try and do something with them. Since SMR is becoming more prevalent, we'll run into enterprise solutions employing the technology for storage sooner or later, so NAS software should be figuring out ways to deal with it as well.

Predictably, I keep running into errors like this:
Code:
Aug  9 17:00:10 nas kernel: sd 8:0:0:0: attempting task abort!scmd(0x00000000c4e14ac5), outstanding for 1420 ms & timeout 1000 ms
Aug  9 17:00:10 nas kernel: sd 8:0:0:0: tag#35 CDB: ATA command pass through(16) 85 08 0e 00 d5 00 01 00 e0 00 4f 00 c2 00 b0 00
Aug  9 17:00:10 nas kernel: scsi target8:0:0: handle(0x000a), sas_address(0x5001e677b7d18fe8), phy(8)
Aug  9 17:00:10 nas kernel: scsi target8:0:0: enclosure logical id(0x5001e677b7d18fff), slot(8)
Aug  9 17:00:10 nas kernel: sd 8:0:0:0: device_block, handle(0x000a)
Aug  9 17:00:10 nas kernel: sd 8:0:0:0: task abort: SUCCESS scmd(0x00000000c4e14ac5)
Aug  9 17:00:10 nas kernel: sd 8:0:0:0: device_unblock and setting to running, handle(0x000a)
Aug  9 17:00:11 nas kernel: sd 8:0:0:0: Power-on or device reset occurred

Surprisingly, there's nothing much that the base configuration of TrueNAS Scale allows me to do about this. The timeout in question appears to be hardcoded in a kernel driver somewhere. I suspect it's related to
Code:
/usr/src/linux-headers-5.10.142+truenas/include/linux/blkdev.h:BLK_MIN_SG_TIMEOUT


Now I'm no wizkid, and recompiling the kernel to see if increasing this value might make SMR disks useable is something I'd rather not try - also because I'm not sure what'll happen next time I update the TrueNAS Scale box I'm performing this experiment on.

So I have a couple of questions:
Thanks!
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
You're not going to get a lot of love with this topic...

Those timeouts are set with performance in mind and messing with them will have flow-on effects even if they do help to not get SMR disks kicked out of your pools.

I recommend using the SMR disks with a different filesystem connected to a different OS.
 

Hoozemans

Cadet
Joined
May 25, 2023
Messages
6
You're not going to get a lot of love with this topic...
It's okay; I have girlfriend: I do her laundry, she laughs at my jokes.

Those timeouts are set with performance in mind and messing with them will have flow-on effects even if they do help to not get SMR disks kicked out of your pools.
I should think that different hardware requires different parameters for maximum performance. It can't be true that a single set of numbers is the best possible solution for all conceivable combinations of hardware and firmware out there.

I recommend using the SMR disks with a different filesystem connected to a different OS.
I assure you, I wasn't going to recompile your box, or anyone else's -- just my own ;)
 

Hoozemans

Cadet
Joined
May 25, 2023
Messages
6
Those timeouts are set with performance in mind and messing with them will have flow-on effects even if they do help to not get SMR disks kicked out of your pools.
Also (sorry, it appears editing is not an option here), tweaking the TrueNAS install to keep disks from being kicked is easy enough -- but the constant resets are killing performance. I'd think that you could get better performance by increasing the timeout to prevent disks from being reset.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Perhaps changing the timeouts would help for SMR disks. Ideally a user would be able to set it on a disk by disk basis so the new, longer timeout can be applied only to SMR disks.

However, that is not the only problem with SMR disks & TrueNAS. The TrueNAS software only uses ZFS for data pools. This means that some of the ZFS methodology can impact SMR disks in addition to the time outs. Like Scrubs or Re-silvers, both of which can take many times longer than CMR disks. SMR disks are basically not suited for RAID configurations, and really not suited for ZFS.

Even non-SMR disks can have problems with ZFS. For example;
  • Desktop disks with long, (aka >1 minute), time limit for error recovery, (TLER)
  • Aggressive head parking
Both those can cause problems. First causing disks to be considered failed, when they may only have 1 bad block. The second can cause time outs, and excessive wear on the head parking mechanism. In essence, desktop disks are not generally suitable for NAS or ZFS.
 

Hoozemans

Cadet
Joined
May 25, 2023
Messages
6
SMR disks are basically not suited for RAID configurations, and really not suited for ZFS.
True. However, I would suggest that the demand for high density storage is on the rise, and rising faster than the price for SSD GiB/€ is dropping. I am predicting there will be more attempts to make NASses (using RAID/ZFS) work with SMR disks, and I'm curious how that will pan out, and what modifications will be made by both software engineers coding storage solutions and disk manufacturers.

Thanks!
 
Top