Measure disk reads from time X to Y?

ImConfuzzled · Apr 20, 2022

We have a TruenAS machine that is used for Veeam repos, among other things. But, it is purely a backup system, mostly taking writes, with no normal daily users, except for a few backup sources. It's a Supermicro, with a Xeon Bronze, 32GB RAM, 4 Red Pros in a RAID 10, and plenty of room to expand. I don't have the exact models handy, but all that's really important here is that it's relying on HDDs for IOPS, and that it was conservatively configured, with expansion in mind, rather than trying to predict future hardware needs.

Performance is generally quite good, if not often rather impressive. There are, however, two cases, both with Veeam backups, that are a problem. On is scrubbing the backups, where it seems that it blows out the caches, and overlapping backup jobs bog the poor array down. The other is trying to restore files from within a backup, but not an entire snapshot, be it inside a machine backup, or a file share backup. I don't have exceptional performance needs, here, either. If searches inside the backups can be brought down to a few minutes, that will be adequate. I don't have to do file/folder restoration often, but when I do, I usually get partial names from a regular user, not a specific file path, and sometimes not even a good estimate of the last time it existed, so searching is a must. All I'd like for the scrubs is to get them completed during the weekend, with time to spare, without having to carefully schedule other backup jobs to try to not overlap.

I've found threads about related types of issues, and messed with some tunables (back to defaults, now). Long story short, all signs point to it being primarily a hardware limitation, and not one where more HDDs would help too much. That was quite unexpected, as if the Spanish Inquisition had just shown up, with a comfy chair. The disks get busy and stay busy, the GbE network is barely utilized, the NAS's RAM fills with caches, and no settings changes make any meaningful changes to performance.

So, to decide what the best path would be, I would like to quantify the problem, now that I've sufficiently qualified it. What comes to mind is to measure how much data is being read from the disks, over a known time period, to help figure out how much of a cache size increase would be ideal. Or, if there's a tool that can track repeated block reads (meaning they got evicted, but wouldn't have with a big enough cache), which could help quantify an ideal cache size, that would do, as well, and probably give more meaningful results. Like, SSH in, run a command or two to start, wait, and then stop it and get a result, in some usable Byte-based dimensions. If I could do that, I could combine that with other info, and make a more informed decision, with an estimate on how much RAM, and/or how big of an L2ARC (if any), should do the job. On, that, I'm stumped.

NugentS · Apr 20, 2022

More ARC, Moooorrrrrreeeeee ARC. Lots more (hey its TrueNAS/ZFS - the answer is almost always lots more ARC)

How does Veeam talk to the repository (SMB, NFS, iSCSi, AFB, FTP, SFTP, WebDav, "insert something I have forgotten"?
How does Veeam store files. Is it lots of small files or just a few really big ones? [Thinking of amount of metadata]

As for measuring how much more I have no idea - but I suspect that L2ARC will not help you at this stage as you don't have enough ARC to use L2ARC

Whats your ARC Hit Ratio?
How big is the backup total size and how much disk space do you have? So we have an idea of scale
Which CPU specifically?

Extra Mirrors would help with IOPS which would probably improve things a bit - but at the moment ARC I would suspect is your issue. CPU might also be a problem, in particular if using SMB

joeschmuck · Apr 20, 2022

ImConfuzzled said:
But, it is purely a backup system, mostly taking writes, with no normal daily users, except for a few backup sources. It's a Supermicro, with a Xeon Bronze, 32GB RAM, 4 Red Pros in a RAID 10, and plenty of room to expand.

If it's just a backup system, is it working well for a backup system? Is sounds like you are doing a lot of recovery from this backup system which is where your issue is.

As @NugentS stated, you likely need more ARC (RAM). The L2ARC is likely not the correct patch for you unless you are accessing the same data frequently. And L2ARC is generally made from one or more SSD to enhance your read speeds over a HDD. You should think about how the computer works when it comes to ARC, L2ARC and such. Briefly, if you need to open a large file (a large single backup file for example) then the file is read into the ARC, if it can't fit into the ARC then it will read pieces over and over again until it collects it's data. With an L2ARC, if that backup file is read often then that file is copied into the L2ARC and now the reads happen from the L2ARC vice the hard drives thus speeding things up a bit over HDD reads. But nothing is faster than RAM in our situation here so install as much RAM as possible. I am by far not the expert on ARC's but if your backup file is 100GB, then I myself, would have likely installed 256GB of RAM because I like overkill but I'd think you would need just over 100GB of RAM to fit the entire file and some overhead in this scenario. If your motherboard supports 256GB, that is what I'd put into it but you can base it off of your file size and how many you desire to access at the same time (fit into RAM). That is a generalized idea of how it works, and don't quote me

.

Also, in the ZFS world we do not use RAID10, it would be a stripped mirrored vdev. When I see RAID I start getting concerned that someone is using a RAID controller with ZFS and really messing up the configuration.

ImConfuzzled · Apr 21, 2022

NugentS said:
How does Veeam talk to the repository (SMB, NFS, iSCSi, AFB, FTP, SFTP, WebDav, "insert something I have forgotten"?

SMB and NFS, with plans to go all NFS, soon.

How does Veeam store files. Is it lots of small files or just a few really big ones? [Thinking of amount of metadata]

Lots of medium sized files. Average size is 53MB, and it tries to keep them no larger 64MB. The largest single backup is is 1.5TB, with about 28K files.

Whats your ARC Hit Ratio?
How big is the backup total size and how much disk space do you have? So we have an idea of scale
Which CPU specifically?

1) Usually stays at 100%, with very occasional little spikes down. But, I noticed that it will do that, lie normal, for awhile, under these loads, then just start getting all spikey. ARC size tends to stay around 18-19GB.
2) About 5TB total, right now, with 4TB current, in a 20TB (4x10TB) array. The largest are about 1.5TB, 1.1TB, 250GB, 75GB, and then several much smaller. Also, backup jobs themselves do very little reading, as VBR keeps a local cache of the last backup state. Total daily changes range from <1MB to 20GB.
3) Bronze 3106. Even having somewhat slow clocks, unless I set compression (which is off for this dataset) to use gzip, I've never seen a single core reach even 50%.

ImConfuzzled · Apr 21, 2022

joeschmuck said:
If it's just a backup system, is it working well for a backup system? Is sounds like you are doing a lot of recovery from this backup system which is where your issue is.

So far, only once. Every other time has been me doing it, to create the same kind of load on the NAS.

As @NugentS stated, you likely need more ARC (RAM). The L2ARC is likely not the correct patch for you unless you are accessing the same data frequently. And L2ARC is generally made from one or more SSD to enhance your read speeds over a HDD. You should think about how the computer works when it comes to ARC, L2ARC and such. Briefly, if you need to open a large file (a large single backup file for example) then the file is read into the ARC, if it can't fit into the ARC then it will read pieces over and over again until it collects it's data. With an L2ARC, if that backup file is read often then that file is copied into the L2ARC and now the reads happen from the L2ARC vice the hard drives thus speeding things up a bit over HDD reads. But nothing is faster than RAM in our situation here so install as much RAM as possible. I am by far not the expert on ARC's but if your backup file is 100GB, then I myself, would have likely installed 256GB of RAM because I like overkill but I'd think you would need just over 100GB of RAM to fit the entire file and some overhead in this scenario. If your motherboard supports 256GB, that is what I'd put into it but you can base it off of your file size and how many you desire to access at the same time (fit into RAM). That is a generalized idea of how it works, and don't quote me .

My understanding was that it worked on records or LBAs, in which case most of the reading would generally be smaller than the files, except when verifying them. The board allows for 768GB of RAM, though that's probably a bit overkill.

Also, in the ZFS world we do not use RAID10, it would be a stripped mirrored vdev. When I see RAID I start getting concerned that someone is using a RAID controller with ZFS and really messing up the configuration.

Click to expand...

Yes, mirrors then stripes, using ZFS, on an Intel controller, with an LSI in there, waiting for more drives to be added.

Click to expand...

NugentS · Apr 21, 2022

I would seriously upgrade your ARC, although not perhaps to 768GB that might be a tad expensive
What specific motherboard are you using as that will define what is sensible, and what will be painfully expensive.

Important Announcement for the TrueNAS Community.

Measure disk reads from time X to Y?

ImConfuzzled

Cadet

NugentS

MVP

joeschmuck

Old Man

ImConfuzzled

Cadet

ImConfuzzled

Cadet

NugentS

MVP

Similar threads

Important Announcement for the TrueNAS Community.

Measure disk reads from time X to Y?

ImConfuzzled

Cadet

NugentS

MVP

joeschmuck

Old Man

ImConfuzzled

Cadet

ImConfuzzled

Cadet

NugentS

MVP

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Measure disk reads from time X to Y?"

Similar threads