Read speeds tank after prolonged reads?

diehard · Jun 5, 2015

Hey everybody! So ive been having some problems with read speeds when running a verification job (100% read) from a Commvault backup system. Read speeds on the job will be an acceptable ~170MB/sec for about 30-45 minutes and then something i can't figure out happens.. speeds drop to around ~20MB/sec.

Checking out gstat, drive usage is optimal and then when the speeds tank i get drives that have a very high MS/R, killing performance. Its completely random which drives they are, and how many it consists of.. sometimes its 4.. sometimes 6, the only thing i have noticed is they tend to be in the same vdev. Stopping all services that cause disk activity returns the read speeds to normal.. again for about 30-45 minutes.

Pool Specs:
3x RAIDZ2 vdevs striped , all vdevs consist of 11 3TB WD RED drives.
41 TiB zvol iSCSI extent, about 60% of pool size.. zpool get fragmentation reports 13%.
Data on the vdevs are balanced, and iostat shows reads are even across vdevs.
SLOG is Intel P3700 200GB.

System Specs:
SUPERMICRO X9DRH-iTF , 2X Intel Xeon E5-2609V2
128GB ECC memory
2x LSI 9207 running firmware/driver P16
Intel X540 10GbE NIC
Supermicro SC847 E26
running FreeNAS-9.3-STABLE-201506042008
Autotune disabled, no fancy tweaking or sysctl's.

The Backup server has the same X540, running Server 2008 R2. iSCSi is connecting with MS iSCSI initiator. using MPIO with seperate subnets and VLAN's. Drive is NTFS and 64K sectors, as the backup files tend to be large. I can give more info about this but i really think the problem is on the storage side.

Has anyone else had problems with drives "choking" after prolonged reads? I've attached screenshots of gstat running 10 minutes in, and what happens after 30-45 minutes.

jgreco · Jun 5, 2015

That sounds like a very close description to what happens on a highly fragmented iSCSI datastore. You're probably blowing through whatever happened to be in ARC/L2ARC (gives great speeds) and then you get to a point where you're no longer hitting ARC *and* you're reading stuff that's not contiguous on disk, so MS/R explodes and life sucks.

The most practical way to address this is to maintain a larger free space reserve on the pool. This allows ZFS to do a better job of allocating contiguous runs of blocks. At 60% full, your iSCSI pool is as full as an iSCSI pool ever should be, and fragmentation will be killing you as writes occur. This has been discussed endlessly on the forum.

https://forums.freenas.org/index.ph...sing-zfs-box-for-vms.32899/page-2#post-207069

diehard · Jun 5, 2015

But i don't think the data is actually in ARC, the backup job is paused then resumed and after just a few seconds speed is restored , reading entirely new data. The exact same data being read goes from "choking" these disks to perfect with a few seconds pause.
Arc hit ratio is 23%

And almost all this data was a full backup written to an entirely empty zvol. I'm going to say there has been a total of about 25GB of data written to the 63TiB zpool.

TBH i thought my only hope would be either you or cyberjock.. ugh :(

jgreco · Jun 5, 2015

Then I may be on the wrong track.

But if "zpool get fragmentation" returns 13%, ...

Now, wait. What do you mean by an "entirely empty zvol"? A newly created zvol? Or just one that you've created an NTFS filesystem on top of, and NTFS has no files stored in it?

diehard · Jun 5, 2015

Newly created zvol and pool (well, couple months?). I could be off on the amount of data written to volume.. been crazy around here lately.

I could zero free space (Currently only about 9TB of active data) to see if it can improve performance but a lot of things have led me to believe the problem is elsewhere.

Thanks for your time btw.

jgreco · Jun 5, 2015

No, please listen carefully and think about this.

That's not a newly created zvol or pool.

A zvol is storing raw data. Your NTFS layered on top may show "no files" and 100% free space, but blocks that were previously written to disk may well still be consuming space out on the ZFS pool. NTFS doesn't zero out space when it removes data. It just marks the blocks as free in its internal spacemap.

So, now, think again: how much data has been written to this zvol over its lifetime? I am guessing 100TB or more. Am I correct?

diehard · Jun 5, 2015

Yes i know about zfs being agnostic to NTFS file deletes (iSCSI initiator in 2008 R2 doesn't support UMAP, trying to move to 2012 R2 since it supposedly does) I was referring to use sdelete to write zero's to the NTFS filesystem. Sorry for not expressing that better.

I believe there has been around ~20-25TB of total writes to the pool. One full backup and incrementals since then. Again i could be a bit off on that..

I made a huge mistake/typo by earlier saying "the same exact data" which would support theory of it being the ARC causing the speed.. i meant to say new data that would definitely not be in ARC. Sorry.

jgreco · Jun 5, 2015

diehard said:
Yes i know about zfs being agnostic to NTFS file deletes (2008 R2 doesn't support UMAP) I am using sdelete to write zero's to the NTFS filesystem.

I believe there has been around ~20-25TB of total writes to the pool. One full backup and incrementals since then. Again i could be a bit off on that..

Do you have compression enabled on your pool? (If not, the zeroes are doing nothing for you.)

This still really really sounds like a fragmentation thing.

diehard · Jun 5, 2015

Added another note to my last post.

Compression is enabled.

If it was fragmentation wouldn't a simple pause and resume of a transfer not help speed? Fragmented data should have a "set" speed if not in ARC, simply letting the disks rest for several seconds improving speed ~8x fold would be really odd to me.

cyberjock · Jun 5, 2015

iSCSI really is a mess because ZFS can't really cache the workload particularly well. zvols are nothing but block storage and since there is no logical sorting by files or anything for ZFS to go on, the best it can do is cache things based on how frequently blocks are used.

So to counter this one of the many things you need to do is increase the iops that your zpool can do. Normally the answer is mirrors as you can take X number of disks and make as many vdevs as possible with mirrors. In your case you have just 3 vdevs, and they are beyond the width we recommend for RAIDZ2.

I think this issue is a combination of the zpool layout not being ideal, your workload not being particularly cacheable, and the fragmentation that you have in your zpool already. ZFS, when under heavy load, if things aren't ideal, can go "off the deep end" in terms of what the zpool and disks are actually doing. I'm guessing that by pausing the backup you're allowing ZFS to recover from the "deep end" it fell into and then works fine until it goes "off the deep end" again.

For iSCSI I pretty much recommend mirrors or very small RAIDZ1 vdevs and nothing else. I/O is almost always a major problem for iSCSI and the only way to resolve the problem is power through it with more vdevs or SSDs (which have orders of magnitude more I/O).

I do appreciate the very precise initial post. You basically called out the exact hardware you are using, even mentioning details like using the right firmware on the LSI card, not using autotune or sysctls, etc. Very impressed (which is why you got a response for me).

diehard · Jun 5, 2015

Yeah i do believe your "deep end" theory is correct. .. It just hits a point where it starts to choke and can't recover. It is kinda frustrating because i mean an hour or so of good transfers almost makes it seem like the pool is fine. Speed wasn't a big issue so thats why i went with a less-than-great zpool configuration.. but 20MB/sec is so... ugh.

Guess i was just hoping that someone else had a similar problem and possibly helped their issue with tuning (possibly limiting read speed?) or some such.

Thanks for the reply.

jgreco · Jun 5, 2015

I have to agree with cj in that it's going off the deep end, but the stats you posted regarding drive utilization make that look like it is something frag related - it looks like the disks are seeking a lot. I guess I don't really know what to suggest, since the normal mitigations are pretty much to use mirror vdevs.

diehard · Jun 5, 2015

Well dang. Thanks for the help, let me know if you think of anything i can tweak without destroying the pool.

I will try to re-create it at some point, but i cant foresee going without backup for a couple days being OK with our IT director lol.

I believe we put in a request to move to Veeam , might just have to do it then. Would 11 vdevs of 3-wide raidz1 be an alright configuration? Enough to mitigate say the increase zpool usage from ~55% to ~65%?

cyberjock · Jun 5, 2015

You might be able to disable ZFS prefetch to help the situation. Note that it will hurt performance in some ways, but will help in others. For your workload it might be a net gain in performance. You just have to set the sysctl vfs.zfs.prefetch_disable=1 in the WebGUI and reboot.

cyberjock · Jun 5, 2015

If you are doing iSCSI, you are virtually forced to do many small vdevs, and keep max disk usage to about 60%. Of course you can do anything, but everything that you do that goes "above and beyond" what is recommend will hurt performance to some extent. If this is backups only and only runs on the weekend you might not care if it takes 24 hours to run. If the backup runs every night then letting it run for 24 hours is obviously not an option. ;)

Chris Moore · Jun 14, 2015

I have a significantly less powerful configuration and ran into what I think it the same problem after a prolonged transfer where I was trying to duplicate all the data on my primary FreeNAS to my secondary as a backup. Speed was great at the start but tapered off gradually over time. It should have only taken 10 hours to copy everything but ended up taking over two days. I wish I had some way to show documentation as it might help with a solution.

Important Announcement for the TrueNAS Community.

Read speeds tank after prolonged reads?

diehard

Contributor

Attachments

jgreco

Resident Grinch

diehard

Contributor

jgreco

Resident Grinch

diehard

Contributor

jgreco

Resident Grinch

diehard

Contributor

jgreco

Resident Grinch

diehard

Contributor

cyberjock

Inactive Account

diehard

Contributor

jgreco

Resident Grinch

diehard

Contributor

cyberjock

Inactive Account

cyberjock

Inactive Account

Chris Moore

Hall of Famer

Similar threads

Important Announcement for the TrueNAS Community.

Read speeds tank after prolonged reads?

Contributor

Attachments

Resident Grinch

Contributor

Resident Grinch

Contributor

Resident Grinch

Contributor

Resident Grinch

Contributor

Inactive Account

Contributor

Resident Grinch

Contributor

Inactive Account

Inactive Account

Hall of Famer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Read speeds tank after prolonged reads?"

Similar threads