Script to enable disk write caches stopped working [SOLVED]

FlyingBear

Dabbler
Joined
Aug 15, 2018
Messages
24
For a couple of years, I've used the script kindly posted by tvsjr that enables the write caches on my SAS drives:

for file in /dev/da? /dev/da??; do echo "WCE: 1" | camcontrol modepage $file -m 0x08 -e; done

Somewhere along the way as I upgraded my FreeNAS version -- sadly I don't know when -- this script stopped working. It now returns "camcontrol: error sending mode select command" for each drive.

My similar script for turning OFF the write caches, i.e. using "WCE : 0" doesn't return errors. I can't tell if it's working or not, as the caches show as off, but it doesn't generate errors.

My hardware hasn't changed at all since I installed FreeNAS. Any ideas on what is causing this issue please? Many thanks.
 

darkfader

Cadet
Joined
Mar 9, 2020
Messages
3
same thing here, yet in a plain FreeBSD 12 system.
so this is an upstream thinggy, nothing FreeNAS.

Code:
[root@fs03 /data]# for file in /dev/da? /dev/da??; do echo "WCE: 1" | camcontrol modepage $file -m 0x08 -e; done
camcontrol: error sending mode select command
camcontrol: error sending mode select command
camcontrol: error sending mode select command
camcontrol: error sending mode select command
camcontrol: error sending mode select command
 

darkfader

Cadet
Joined
Mar 9, 2020
Messages
3
one more detail:
I haven't found a workaround so far. I don't know if i'll get around to open a PR for it.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
enables the write caches on my SAS drives
Enabling write cache on drives isn't great for ZFS. You will risk data loss in the case of power loss to the system.
If you're not concerned with data loss, the speed increase will probably be appreciable, so I see why you would do it (just wanted to highlight for anyone finding the thread that you are doing something risky... if you can work out how to keep doing it).
 

darkfader

Cadet
Joined
Mar 9, 2020
Messages
3
Yes, there are circumstances where this is true.
It can get worse the more tradeoffs we have to make.

It would still probably be more productive to first worry about CAM bugs. This one is not gonna hurt ZFS, but what happened here? Some bug got introduced in the most trustworthy SCSI stack of all (living) Unixes. Are there others, which ZFS functions could be affected?



The main benefit of the cache is during rebuilds, where apparently there's still some code inefficiencies. A <2 hour rebuild with streaming IOs vs a >20 hour rebuild hammering small IOs. if we talk risks, you can answer yourself which scenario puts the data at higher risk.
Yes warning is advisable, but this has been posted 1000s of times in ZFS land. And then people put their stuff on SATA drives that probably just ignore the cache settings anyway. And people should have some UPS anyway if at all possible. Although in this city here, the grid is known more stable than how long the typical UPS stays alive ;-)
 

FlyingBear

Dabbler
Joined
Aug 15, 2018
Messages
24
Interesting, and thanks for the reply @darkfader. I found the issue when troubleshooting a slowdown in the performance of a small encrypted pool: large writes stall for a few seconds every few GB. My FreeNAS server is on a commercial-grade UPS and I've not worried about potential data loss with the SAS drive write caches enabled. It is disturbing that something got changed or broken in the SCSI stack....
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Enabling write cache on drives isn't great for ZFS. You will risk data loss in the case of power loss to the system.

Unless the dataset is sync=always, data is at risk in case of power loss; drive cache has no bearing on this as ZFS can issue cache flushes when commiting sync writes to vdevs. If cache is enabled it can also push async writes at much higher speeds.

@FlyingBear what version of FreeNAS are you on? It might be as simple as a syntax change on camcontrol.

I found the issue when troubleshooting a slowdown in the performance of a small encrypted pool: large writes stall for a few seconds every few GB.

It's probably the write throttle kicking in that's actually causing the stall, but no write cache on your pool devices would certainly put the brakes on your vdev speeds.
 

FlyingBear

Dabbler
Joined
Aug 15, 2018
Messages
24
Thanks for your help, @HoneyBadger! If it's just a syntax change, that would be great. I'm on 11.3-RELEASE, i.e. I haven't updated to 11.3 U1 yet.
 

FlyingBear

Dabbler
Joined
Aug 15, 2018
Messages
24
I spent some time trying to pin down this issue, with only a little progress: it doesn't appear to be a syntax issue. I can interactively edit the 0x08 modepage and change WCE from 0 to 1. But when I save the modepage file, I get the "error sending mode select" message. So the problem seems to lie in the SCSI stack somewhere, maybe a permissions issue.
 

FlyingBear

Dabbler
Joined
Aug 15, 2018
Messages
24
Thank you for your help. I upgraded to 11.3-U3.1 and that didn't fix the issue.

As root, I tried:

echo "WCE: 1" | camcontrol modepage da1 -m 0x08 -v -e

The -v flag yielded more information than I got before:

(pass1:mps0:0:11:0): MODE SELECT(10). CDB: 55 10 00 00 00 00 00 00 1c 00
(pass1:mps0:0:11:0): CAM status: SCSI Status Error
(pass1:mps0:0:11:0): SCSI status: Check Condition
(pass1:mps0:0:11:0): SCSI sense: ILLEGAL REQUEST asc:26,0 (Invalid field in parameter list)
(pass1:mps0:0:11:0): Field Replaceable Unit: 4
(pass1:mps0:0:11:0): Data byte 27 bit 7 is invalid
(pass1:mps0:0:11:0): Descriptor 0x80: 00 00 05 26 00 04 ff ff ff ff ff ff 00 00
camcontrol: error sending mode select command

Of course, I get the same result in interactive editing mode, setting WCE to 1 and saving the temp file. That temp file that the -e flag creates is:

IC: 0
CAP: 0
DISC: 1
WCE: 0
RCD: 0
Minimum Pre-fetch: 0
Maximum Pre-fetch: 65535
FSW (Force Sequential Write): 0
DRA (Disable Read-Ahead): 0
Number of Cache Segments: 16

Changing a different parameter, e.g. setting DRA to 1, yields the exact same error messages

According to the Seagate SCSI commands reference manual, mode page 08 has 20 bytes, so I don't know what the reference to data byte 27 in the error messages means.

The disks I'm trying to enable write cache on are IBM-badged Seagate ST8000NM0135 8TB SAS 12Gbps hard drives. They power on with write cache disabled, for reasons unknown, but a FreeNAS startup script that enabled write caching worked fine for years. Sadly, I don't know where along the FreeNAS upgrade route this script began to fail.
 
Last edited:

FlyingBear

Dabbler
Joined
Aug 15, 2018
Messages
24
AHA!!! I found the fix. You have to use the "-6" argument to force 6-byte versus the default 10-byte MODE commands. So the script now reads:

for file in /dev/da? /dev/da??; do echo "WCE: 1" | camcontrol modepage $file -m 0x08 -6 -e; done
 

alexhore

Explorer
Joined
Sep 24, 2014
Messages
52
Sorry for kicking an old thread, this works nice:
Code:
for drive in {0..9}; do smartctl -s wcache,on /dev/da$drive; done


(smartctl -s wcache) does not seem to support the -p "preserved" option which the ata version does, such a shame.

wcache-sct[,ata|on|off[,p]] - [ATA only] Gets/sets the write cache feature through SCT Feature Control (if supported). The state of write cache in SCT Feature Control could be "Controlled by ATA", "Force Enabled", or "Force Disabled". SCT Feature control overwrites the setting by ATA Set Features command (wcache[,on|off] option). If SCT Feature Control sets write cache as "Force Enabled" or "Force Disabled", the setting of wcache[,on|off] is ignored by the drive. SCT Feature Control usually sets write cache as "Controlled by ATA" by default. If ',p' is specified, the setting is preserved across power cycles.
 
Top