SMB performance issues after Veeam update

GeorgePatches

Dabbler
Joined
Dec 10, 2019
Messages
39
So this is a work problem that has me a little baffled. We use an TrueNAS X10 with 10 drives in a RaidZ1 and SMB as our storage target for our Veeam backups. We recently updated from Veeam v10 to v11 and our active full backup times have increased ~30% and our incremental times have increased ~200%. Incrementals now take nearly an entire work day, where as before they started around midnight and ran until about 7am. I'm working with Veeam support on the issue on one of the things they had me do is run Microsoft's diskspd.exe disk benchmarking tool (instructions here https://www.veeam.com/kb2014)(command line parameters https://github.com/Microsoft/diskspd/wiki/Command-line-and-parameters). With a single thread I see around 2Gbit transfer speeds, and with 2 threads I max out the single threaded SMB service around 3Gbit. However, when I use the -Sh flag (as instructed Veeam) to disable caching, the performance plummets to 160Mbit. This seems to explain our performance issue as one of the "features" of Veeam v11 is that the disable all the OS level caching so that fancy storage arrays can better use their own caching schemes. My question to everyone is why does disabling the caching cause the SMB performance to drop off such a cliff? Like I get that not using caching could decrease performance, but a 90% hit seems excessive.

The pool again is a 10 HDD RaidZ1, around 60% full, running on a TrueNAS X10. Anything I should double check? I'm stumped right now.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
My initial thought is that you have the throughput of a single HDD in your array. But that doesn't explain why the sudden increase in time
 

GeorgePatches

Dabbler
Joined
Dec 10, 2019
Messages
39
Yeah, I realize it's not the most performant setup. It's optimized for capacity, not speed. It was solidly good enough though until this last week. No individual drive is showing excessive disk busy, they average about 20% busy during a backup with spikes only up to 60%. In fact you could basically lay the IO and busy charts on top of each other for all 10 drives. Nothing stands out, they're all the same.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Veeam, in order to do backups, in particular incrementals / differentials will have to do a boat load of reads in order to know what to update. You are backing up a lot of data if your incrementals were about 7 hours. With the Veeam cache off it has to retrieve everything it needs from the disk each time as your ARC is nowhere near big enough (and probably never will be). This might explain the Veeam cache - but not the sudden slow down.

You could, assuming I am right (which is far from 100%) up your ARC (to account for the L2ARC) and put a really big (Multiple TB) SSD in as L2ARC on the pool. It won't improve write speeds - but it might improve the Veeam processing time. I would test with a medium size SSD if you have one lying around first rather than buy a new big one. The memory might be expensive too for a test

Note I am guessing here
 
Last edited:

GeorgePatches

Dabbler
Joined
Dec 10, 2019
Messages
39
Reasonable guesses, but unfortunately I don't think that's the issue. The hit rate for the ARC is in the high 90s and I'm not seeing massive read IO on the disks. However, I am seeing a big spike in the data sent by the truenas during the incremental jobs. I think I'll check the CPU usage of the samba daemon when the next incremental runs. Maybe it's maxing out the single thread performance of samba with read requests, even if they're mostly satisfied by the ARC.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Again (and it doesn't explain everything, not even most) you are asking a Z1 array which sucks at random reads/writes to upload a load of metadata and maybe more (I really don't know the low level mechanics of exactly how an incremental is done and how files are updated and rolled into each other as incrementals are retired)

It does however seem very sensible for Veeam to cache as must as possible to take load away from the NAS. Presumably mostly the metadata so other requests are more targeted

But I am guessing.
160Mbit is much slower than a single HDD can do which is more in the 160-200MB (as long as the heads don't have to fly around)

What is / are the CPU(s) in the X10? (IX's website is very quiet on the subject - according to a cursory look)

Also what happens if you run the disk performance twice with a small enough dataset to fit in the ARC and nothing else hitting the box. First run might be slow but the second should fly as the data should be in ARC. If the second is slow as well it would suggest an issue with the X10/TrueNAS (note the program is not one I am familiar with)
 

GeorgePatches

Dabbler
Joined
Dec 10, 2019
Messages
39
I realize Z1 is not the best at a lot of things, but I needed to make the most space possible out of 10x10TB drives with a tiny bit of fault tolerance. As for the mechanics of Veeam, we're not doing anything fancy. We write one big full backup over the weekend and then were use change block tracking to only pull the changes and make incrementals from that. This is the least disk thrashy backup configuration I could come up with. There's no consolidation, no roll ups. Once a backup ages out it is simply deleted. Yes, technically it still needs to read from the truenas some to be able to figure out what's changed in the changed blocks, but if the disks aren't thrashing and sitting at 100% busy, it seems like the array can handle what is being asked of it.

I agree that 160 Mbit (18MB) is way too slow. I'd be perfectly happy with 160-200MB given the known compromises I have made.

The CPU is a single Intel Xeon D-1531. 12 threads, 6 cores. https://ark.intel.com/content/www/u...-xeon-processor-d-1531-9m-cache-2-20-ghz.html

That's an interesting test idea. I'll try that out and get back to you.


I don't think it'll make much of a difference given how the Veeam has written their test commands. They specify 100% writes for the active full and forward incremental test (the backup scheme we use).
 
Top