Current alert - all my drives are causing slow I/O my pool Main(RaidZ2)

overshoot · Feb 9, 2021

Hi,

I have just upgraded to TrueNas Core from FreeNAS 11.3 and since then I am getting those alerts.
So far all the drives are reported to be causing slow I/O on my pool Main.
To be noted: I have both a log and cache on this pool

I have 2 kind of drives:
- WDC WD20PURZ-85G (WD Purple)
- TOSHIBA MG03SCA200

I have 4 of each in a RaidZ2 pool.
I understand the WD Purple are not great for the job but that's all I have at the time.
However all of these are CMR drives if I am correct?!

Is that something I should worry about?
Anything that I could do to remove those alerts, in a sense to improve the situation?

joeschmuck · Feb 10, 2021

Please follow the forum rules and post all your system specs and also post the specific error messages you are getting. You could also do an internet search for "truenas" + your error message, see what pops up, odds are someone has mentioned it already.

overshoot said:
I understand the WD Purple are not great for the job but that's all I have at the time.

Yea, I cringed when I read that but they are good drives, but they are designed as Video drives, meaning the drive firmware is okay if a bit or two are corrupt, it doesn't affect a typical video stream, but from a data perspective it could be very bad and it sounds like you already know this so I'm not going to try to make you feel bad about it. I would recommend that if you have important data on your drives, replace your drives one at a time as you can afford them.

overshoot · Feb 10, 2021

Thanks for your answer.
I actually did a search on that question but the answer i found was mentioning SMR drives as being the problem.
In my case, none of them are SMR (apparently).
I also found some similar questions on reddit and those users were mentioning that it happened after the TrueNAS upgrade.
Under FreeNAS, I had no mention of slow I/O.

Well, my setup... ;-)
I am using TrueNAS as a Proxmox VM and it's been now almost 2 years and working well.
The Server is a Dell PowerEdge T430 with its RAID card set as HBA and passed-through to the VM.
I have 128GB of RAM, a NVMe drive as cache and a PCIe SSD with battery as log device.
Both the Log and Cache devices are passed-through to the VM as well as a 10Gb ethernet adapter.

Now, when it comes to the WD purple, I did understand I was supposed to get Red drives however I was not aware the firmware was Ok with corrupted data.
I read that other users are using WD blue drives with no issues so I thought with Purple ones, I would be kind of Ok.
Are you saying that because of the behavior of those Purple drives, I am putting my data at risk?

What bugs me also is that the Toshiba drives are also listed in the slow I/O.
These are SAS drives and I don't get why they would not work well in a RAID setup.

So having made a mistake with the purple drives, that's something I can understand.
However why suddenly all the drives become slow, especially after an upgrade?! That seems tricky to me.
Kind of feel like with Apple upgrading the OS and then having the battery showing "need service"... ;-)

I've run the iostat command but that looked like Greek to me.

LarsR · Feb 10, 2021

Did you update all the way tu U2? there were some SMB Issues with 12.0 and U1 that were resolved in U2.

overshoot · Feb 10, 2021

I went straight to version U2.

I've seen the alerts to be cleared this morning.
However I believe these will come back, especially when the Server has to push harder.
That's what others have reported on Reddit where I saw the problem mentioned.

joeschmuck · Feb 11, 2021

overshoot said:
Now, when it comes to the WD purple, I did understand I was supposed to get Red drives however I was not aware the firmware was Ok with corrupted data.
I read that other users are using WD blue drives with no issues so I thought with Purple ones, I would be kind of Ok.
Are you saying that because of the behavior of those Purple drives, I am putting my data at risk?

The Blue drives are data drives, the Purple drives are optimized for video content. It's the firmware that is optimized for video content which potentially puts your data at risk. So IF your drive has a read error, it will try it again several times but if the read error persists, the drive will give up and continue to read data and ignore the data failure. I'm not using the exact words on how it works but that is the idea. So it's very important to do your SMART Long testing and your SCRUBs for those Purple drives, well for all drives to be honest.

HoneyBadger · Feb 12, 2021

Minor note and nitpick about the WD Purples and other "Surveillance/Video Streaming" drives.

These drives have firmware that implements the "ATA Streaming" feature set - this is what allows them to accept the "writes with corruption" or "writes with a time limit" but they don't default to it by any means. Things like video metadata can't tolerate the same type of "lossy writes" so it wouldn't work if a Purple drive decided on its own that a metadata write would be okay missing a few bits.

Given the design of ZFS to avoid data corruption/loss, it wouldn't make sense for them to be sending "streamed writes" to devices. Purple and other drives should be fine, and the presence of TLER is a net benefit as it means a drive won't stall for an endless/unknown period of time trying to retry a sector when it could fail within a given timeframe and let ZFS repair it upstream with its own redundancy measures. (Of course if you're using it in a stripe, all bets are off for data integrity.)

joeschmuck · Feb 14, 2021

HoneyBadger said:
These drives have firmware that implements the "ATA Streaming" feature set - this is what allows them to accept the "writes with corruption" or "writes with a time limit" but they don't default to it by any means.

Thanks for the clarification, it's appreciated even tough it contradicts some of the stuff I read, but it makes more sense to me.

overshoot · Feb 14, 2021

Thanks also for the clarification!

HoneyBadger · Feb 15, 2021

joeschmuck said:
Thanks for the clarification, it's appreciated even tough it contradicts some of the stuff I read, but it makes more sense to me.

Would appreciate seeing the links that you've pulled up. Manufacturers often aren't clear on their firmware, so I'm trying to interpret the ATA command whitepapers. There's different methods by which the streaming command set handles errors that seem to infer they're willing to just "tolerate and log" a write error after a set period vs. retry endlessly, similar to TLER on reads.

If the drive is being used for surveillance purposes, it could be receiving multiple camera feeds - better to have one feed miss a frame instead of all feeds suffering a long pause.

overshoot said:
I have 128GB of RAM, a NVMe drive as cache and a PCIe SSD with battery as log device.

Couple questions for you.

Is 128GB the total amount in the host, or the amount assigned to the VM?

What kind of workload are you putting on the unit? An 8-drive Z2 will be good for sequential workloads but not if you're hosting random I/O or VMs.

Can you elaborate a bit on the cache and log device types? "PCIe SSD with battery" - do you mean something like an NVRAM device (RMS-200?) or a regular M.2 SSD mounted in one of the adapter cards with an onboard capacitor?

G8One2 · Feb 15, 2021

I'm curious how full your pool is.....

overshoot · Feb 15, 2021

HoneyBadger said:
Couple questions for you.

Is 128GB the total amount in the host, or the amount assigned to the VM?

What kind of workload are you putting on the unit? An 8-drive Z2 will be good for sequential workloads but not if you're hosting random I/O or VMs.

Can you elaborate a bit on the cache and log device types? "PCIe SSD with battery" - do you mean something like an NVRAM device (RMS-200?) or a regular M.2 SSD mounted in one of the adapter cards with an onboard capacitor?

128GB is the amount assigned to the FreeNAS VM.
We're using that pool as a file server only, we're not hosting any VMs on it nor using it for a DB.
We have a dozen of Mac/PC which are accessing the Server.
Server is on a 10Gb link whereas the clients are on a 1Gb link each.
The cache device is a MO0400KEFHN (Intel DC P3700) 400GB PCI-E Nvme in an adapter card.
The log device is a Samsung SSD 970 EVO 250GB.

The pool is full at 70%.

HoneyBadger · Feb 16, 2021

overshoot said:
128GB is the amount assigned to the FreeNAS VM.
We're using that pool as a file server only, we're not hosting any VMs on it nor using it for a DB.
We have a dozen of Mac/PC which are accessing the Server.
Server is on a 10Gb link whereas the clients are on a 1Gb link each.
The cache device is a MO0400KEFHN (Intel DC P3700) 400GB PCI-E Nvme in an adapter card.
The log device is a Samsung SSD 970 EVO 250GB.

The pool is full at 70%.

Unless you've forced sync writes you're very likely not making any real use of that SLOG device (although Mac clients default to doing some SMB sync work) - but if you are doing them, you should swap those two devices. P3700 has more endurance and power-loss-prevention so it's a better SLOG whereas your 970 EVO is better as a cache device. How does the "battery" statement fit into here? External UPS?

Read-wise, you're certainly not short on RAM in the VM; what do the ARC hit rates look like?
Write-wise, if this is an active fileserver with lots of overwrites and changes, you might be seeing a lot of fragmentation on the data vdevs. If you open a gstat -dp in a shell window, do you see higher latency times on the ms/read or ms/write column?

overshoot said:
The Server is a Dell PowerEdge T430 with its RAID card set as HBA and passed-through to the VM.

What is the exact model of card and firmware? You're in the right generation to have a PERC H330 which notably isn't the same as the PERC HBA330 - the former is a discount RAID card, the latter a proper HBA.

overshoot · Feb 16, 2021

My bad, it's just me being stupid.
It's the other way around, Samsung EVO is used as cache and the P3700 is used as Slog.
Hasn't the Intel drive enough capacitors to act as battery to protect the data during a Server power loss?!
At least, that's what I meant with battery.

The Arc Hit Ratio is at 100% constantly.

I see higher latencies on the writes.
Actually it's a great tool to know!
I can see the huge difference between the 2 kind of drives.
10-20ms for the Toshiba drives compared to around 1-2ms for the WD drives
The busy rate is around 50% on the Toshiba drives also.

The RAID card is a MegaRAID SAS-3 3108
For the firmware I can't remember. But I believe it was purchased from eBay and already set up as an HBA card if my memory serves me well.

HoneyBadger · Feb 17, 2021

That makes more sense with the P3700 being the SLOG. If you run zilstat from a shell do you see activity on it during periods of writes? I'd only expect to see any if your Mac endpoints are writing.

A maxed-out ARC hit ratio means that your reads are coming from RAM which is good. 100GB+ of most recent/frequently accessed data is definitely reasonable to be your "working set" so you get the big wins from that.

The 10x delay in write latency between the Toshiba and WD drives is definitely significant. Are they all on the same cable off the HBA by any chance, or with a different common connection point (power, backplane, etc?) Almost seems as if the device write cache is off on them somehow.

Regarding the SAS3108, it's technically a RAID controller, but in HBA mode it will use the mrsas driver and should be fine.

overshoot · Feb 24, 2021

Sorry for the late reply.
I can't access the machine for a while now as I am working offsite.
And I don't recall how the card is connected to the Server front backplane.

Is that something we can get the info through some software using a command line tool?

Important Announcement for the TrueNAS Community.

Current alert - all my drives are causing slow I/O my pool Main(RaidZ2)

overshoot

Explorer

joeschmuck

Old Man

overshoot

Explorer

LarsR

Guru

overshoot

Explorer

joeschmuck

Old Man

HoneyBadger

actually does care

joeschmuck

Old Man

overshoot

Explorer

HoneyBadger

actually does care

G8One2

Patron

overshoot

Explorer

HoneyBadger

actually does care

overshoot

Explorer

HoneyBadger

actually does care

overshoot

Explorer

Similar threads

Important Announcement for the TrueNAS Community.

Current alert - all my drives are causing slow I/O my pool Main(RaidZ2)

Explorer

Old Man

Explorer

Guru

Explorer

Old Man

actually does care

Old Man

Explorer

actually does care

Patron

Explorer

actually does care

Explorer

actually does care

Explorer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Current alert - all my drives are causing slow I/O my pool Main(RaidZ2)"

Similar threads