clifford64
Explorer
- Joined
- Aug 18, 2019
- Messages
- 87
Any ideas?
I'm afraid I don't know what's going on. There almost seems to be a pathological issue impacting iSCSI performance in the latest release(s) as performance "out of the gate" on TN12 were very promising; but the potential data corruption issue necessitied a quick follow-up to fix.
It's worth a shot certainly, just to eliminate the switch as a potential bottleneck.Do you think it would be worth testing bypassing the switch and going directly to the SAN on one host to see if that makes a difference?
It's worth a shot certainly, just to eliminate the switch as a potential bottleneck.
DQLEN dropping on your iSCSI LUNs is indicative of either VMware adaptive queueing or SIOC kicking in (and I'd wager you haven't got access to or haven't enabled the latter) - this happens when VMware sees the SCSI sense codes for BUSY or QUEUE FULL.
At this point I'm guessing your disks are just getting bogged down trying to randomly seek around when asked to piece together that large VM for an svMotion. I also wonder if maybe those WD EARS disks are choking things up; did you ever use the wdidle tool to stop their aggressive head-parking?.
Can you pull SMART stats on everything and attach as a .txt? If you've got a disk or two that's choking hard and throwing errors that will drag the whole thing down.
Another question: how old is the pool itself, as in when was it created? Over time, copy-on-write will fragment the data.
I am not sure what that is or how to change it. I don't think I have configured that in ESXi on my hosts. I do know that my two hosts are showing different values for DQLEN when viewing esxtop on each host.
I have not used this tool. Can you provide more info on it and how to use it? Will it do anything to my pool?
Smart info is attached in zip file. Includes all drives in my system. I am missing smart tests on some of my drives because I replaced the drives and I guess this does not add them to the smart test cycle. I noticed that today when I was poking around. I originally setup my pool with all drives to perform smart tests on regularly.
mirror0_drive2: 9 Power_On_Hours 0x0032 034 034 000 Old_age Always - 48702 193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 1058439 mirror5_drive2: 9 Power_On_Hours 0x0032 024 024 000 Old_age Always - 56135 193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 708539
I recreated the pool back in summer of 2020. I originally made it in 2019 with 2 Raidz2s with 6 drives each, but was trying to get better performance and went to 6 mirrors. I backed up the data VMs with Veeam, recreated the pool and then restored the VMs.
Adaptive queue length is enabled by default, and this is likely what's causing the queue depth to get chopped. If the storage array is overwhelmed or fills its own device queue it sends back SCSI codes stating it's BUSY or QUEUE FULL, which VMware responds to by trying to throttle back how much data it's sending.
SIOC (Storage I/O Control) requires you to manually switch it on and is basically "Quality of Service" for storage, it tries to prioritize "fairness" and prevents any one VM or workload from stomping all over the others.
It's designed to stop the overly-aggressive head-parking timer on the WD Greens - but if you've never heard of it, the proverbial damage is probably already done on the Green drives.
Check this very old thread by @cyberjock for its usage - https://www.truenas.com/community/threads/hacking-wd-greens-and-reds-with-wdidle3-exe.18171/ - the "Ultimate Boot CD" still apparently contains it. Shouldn't do anything to your pool, but again, I say "shouldn't" - if your Green drives decide to give up the ghost, then your pool would likely be damaged.
I took a quick look at the SMART data; only one drive had a single reallocated sector, which is good, but your load cycle counts are enormous. I didn't see a total count under 5 figures for any one drive, most of your Seagates were in the 50K range, and the WD Greens ranged from about 180K on the low end to these two ones that are downright narcoleptic with how often they've tried to go to sleep on you:
Code:mirror0_drive2: 9 Power_On_Hours 0x0032 034 034 000 Old_age Always - 48702 193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 1058439 mirror5_drive2: 9 Power_On_Hours 0x0032 024 024 000 Old_age Always - 56135 193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 708539
For comparison, I have a drive over nine years spinning with 290 load cycles. Not 290,000 - 290.
Is ditching the Green drives an option, or at least replacing the two called out above?
Is there a lot of activity on the datastore itself (deletes, overwrites, changes) that would fragment it? But right now I'm calling "green sus"
The main datastore changes would be a media server that transcodes to h265 and a minecraft server. The minecraft server is only 80GB. The media server is 6TB that I zero out the free space every now and then because my backups get enlarged.
esxtop
go to the u
for disk device stats, then hit f
for field changes and make it so that only A and O (NAME and VAAI Stats) are visible. Do you have values under the ZERO or DELETE columns? A proper VMFS setup should pass the UNMAP commands through and register them as DELETE - if you're seeing ZERO it might not be configured for UNMAP.I do have plans to replace all drives with schucked 8tb elements, but I have mostly been waiting for drive failures. I got all my drives for free and they are pretty darn old. I have 1 free drive left and then I have 1 8tb that I have already schucked and started to buy. I suspect over the next year I will start replacing the drives. (Going with 8tb elements because WD has no SMR at 8tb or above and its cheaper to buy elements than it is to buy 4tb red pros. Would rather upgrade now than buy 2TB and upgrade in a year or two.)
Should I leave the ESXi config in adaptive queue length?
The Green drives specifically like to park their heads after only a few seconds of idle time, so it's entirely possible they're parking when you're trying to drive I/O to them. Or they've done it so often in the past that the actuator motor for the disk arm is wearing out.As for widdle, I would think that this shouldn't be having an impact on drive performance during sequential read and writes right? It should only be doing it if the pool goes idle for an amount of time?
How often does the content on the media server change, and what method are you using to zero free space?
From theesxtop
go to theu
for disk device stats, then hitf
for field changes and make it so that only A and O (NAME and VAAI Stats) are visible. Do you have values under the ZERO or DELETE columns? A proper VMFS setup should pass the UNMAP commands through and register them as DELETE - if you're seeing ZERO it might not be configured for UNMAP.
See if you can accelerate the replacement, even if you can find another couple of cheap 2TB Seagates to fill in for now. If you have recent backups (although that's the source of these woes, so I'm not banking on it) you could even manually offline the two drives with very high LCC values and see if that changes things; if they're being particularly slow to respond, that could drag your whole pool down.
There's a significant amount of overhead here with your Plex server being a VM on a remote hypervisor. Is it possible and/or have you considered making the media server share an SMB mount point on the TrueNAS/FreeNAS machine itself, and having the Plex VM connect to/index it remotely?
I run tdarr which transcodes all items that are imported into the server automatically. I haven't had any major imports recently. Mostly about 5-10GB a week. Currently, my backup disk is smaller than the fully disk of my VM, but the actual used space should still fit on the backup. I can trick Veeam into only backing up the used space and it will fit. Currently have 1TB free on backup RAID.
I zero out the freespace by using the zerofree tool in ubuntu. I boot the main plex VM to a live ubuntu install and run the command to zero out unused space.
I believe the main LUN is going to be the middle one. All VMs and VMDKs are thick provisioned.
View attachment 46089
I will do my best. I will probably be buying a drive every two weeks.
I have thought about it, but I was trying to go for a setup that allowed me to learn about vCenter, ESXi, and iSCSI. I work as a sysadmin so being able to play around with the technologies is a little more important. Yeah, it would probably be better to use it as an SMB share rather than iSCSI, but I also like having iSCSI because of being able to use HA resources with vCenter.
I do have current backups. The system works and I am not getting any performance issues within the VMs themselves yet. Really the bottlenecks seem to be when doing svMotions and backups. I will work on replacing my drives and hopefully that will help with it.
It's sending DELETE commands through so the UNMAP is being properly sent. You're reclaiming space at least but right now I believe there's a lot of fragmentation of your media server's VMDK - it might think it's writing to LBA 1, 2, 3, 4 in the guest, but that might be getting scattered across the full range of disk space on the ZFS pool depending on what gets written where.
Let's see what happens with the removal/replacement of the Greens. Do you have an extra/available slot to replace the drive without degrading the mirror (eg: make it a mirror-3 first, then remove the Green to return to mirror-2)
The reason I ask here is because the iSCSI overhead is chopping those media files up into tiny little pieces (16K max volblocksize unless you've adjusted it) rather than letting them be larger (128K or even 1M if adjusted) chunks if accessed directly over SMB. You could still use an iSCSI ZVOL to hold the VM's boot device, benefit from being able to migrate the VM around live between hosts, and further learn about the VMware technology stack, but let the large media files sit on the SMB share for efficiency reasons. VMFS is also another layer of abstraction that could cause things to fragment around more under read/write/modify.
Keep us updated.
Would there be a good way to defrag it? I am in the business I work at, we have quite a few DBs and such on datastores, wouldn't those get fragmented as well?
I'll begin working on replacing the greens first, but it might take some time. Unfortunately, I have all drive slots used up. I have a few offline cold spares that I keep around for when I need to do replacements.
I take it there is no way to change the volblocksize without re-doing the pool? If I need to re-do it in the future, then I would probably set it up that way first.
volblocksize
after a zvol has been created - you'd have to create a new zvol, but even then, don't use a 1M volblocksize as it will really hurt random I/O. You can change recordsize
on datasets (eg: SMB mounts) and it will take effect on newly written data.I don't have much experience with FreeNAS as a NAS, would you recommend two different pools for that? Or a SMB share for plex and zvol for iscsi?