ImConfuzzled
Cadet
- Joined
- Apr 20, 2022
- Messages
- 3
We have a TruenAS machine that is used for Veeam repos, among other things. But, it is purely a backup system, mostly taking writes, with no normal daily users, except for a few backup sources. It's a Supermicro, with a Xeon Bronze, 32GB RAM, 4 Red Pros in a RAID 10, and plenty of room to expand. I don't have the exact models handy, but all that's really important here is that it's relying on HDDs for IOPS, and that it was conservatively configured, with expansion in mind, rather than trying to predict future hardware needs.
Performance is generally quite good, if not often rather impressive. There are, however, two cases, both with Veeam backups, that are a problem. On is scrubbing the backups, where it seems that it blows out the caches, and overlapping backup jobs bog the poor array down. The other is trying to restore files from within a backup, but not an entire snapshot, be it inside a machine backup, or a file share backup. I don't have exceptional performance needs, here, either. If searches inside the backups can be brought down to a few minutes, that will be adequate. I don't have to do file/folder restoration often, but when I do, I usually get partial names from a regular user, not a specific file path, and sometimes not even a good estimate of the last time it existed, so searching is a must. All I'd like for the scrubs is to get them completed during the weekend, with time to spare, without having to carefully schedule other backup jobs to try to not overlap.
I've found threads about related types of issues, and messed with some tunables (back to defaults, now). Long story short, all signs point to it being primarily a hardware limitation, and not one where more HDDs would help too much. That was quite unexpected, as if the Spanish Inquisition had just shown up, with a comfy chair. The disks get busy and stay busy, the GbE network is barely utilized, the NAS's RAM fills with caches, and no settings changes make any meaningful changes to performance.
So, to decide what the best path would be, I would like to quantify the problem, now that I've sufficiently qualified it. What comes to mind is to measure how much data is being read from the disks, over a known time period, to help figure out how much of a cache size increase would be ideal. Or, if there's a tool that can track repeated block reads (meaning they got evicted, but wouldn't have with a big enough cache), which could help quantify an ideal cache size, that would do, as well, and probably give more meaningful results. Like, SSH in, run a command or two to start, wait, and then stop it and get a result, in some usable Byte-based dimensions. If I could do that, I could combine that with other info, and make a more informed decision, with an estimate on how much RAM, and/or how big of an L2ARC (if any), should do the job. On, that, I'm stumped.
Performance is generally quite good, if not often rather impressive. There are, however, two cases, both with Veeam backups, that are a problem. On is scrubbing the backups, where it seems that it blows out the caches, and overlapping backup jobs bog the poor array down. The other is trying to restore files from within a backup, but not an entire snapshot, be it inside a machine backup, or a file share backup. I don't have exceptional performance needs, here, either. If searches inside the backups can be brought down to a few minutes, that will be adequate. I don't have to do file/folder restoration often, but when I do, I usually get partial names from a regular user, not a specific file path, and sometimes not even a good estimate of the last time it existed, so searching is a must. All I'd like for the scrubs is to get them completed during the weekend, with time to spare, without having to carefully schedule other backup jobs to try to not overlap.
I've found threads about related types of issues, and messed with some tunables (back to defaults, now). Long story short, all signs point to it being primarily a hardware limitation, and not one where more HDDs would help too much. That was quite unexpected, as if the Spanish Inquisition had just shown up, with a comfy chair. The disks get busy and stay busy, the GbE network is barely utilized, the NAS's RAM fills with caches, and no settings changes make any meaningful changes to performance.
So, to decide what the best path would be, I would like to quantify the problem, now that I've sufficiently qualified it. What comes to mind is to measure how much data is being read from the disks, over a known time period, to help figure out how much of a cache size increase would be ideal. Or, if there's a tool that can track repeated block reads (meaning they got evicted, but wouldn't have with a big enough cache), which could help quantify an ideal cache size, that would do, as well, and probably give more meaningful results. Like, SSH in, run a command or two to start, wait, and then stop it and get a result, in some usable Byte-based dimensions. If I could do that, I could combine that with other info, and make a more informed decision, with an estimate on how much RAM, and/or how big of an L2ARC (if any), should do the job. On, that, I'm stumped.