SSD Pool / TRIM missbehaving ?

Andres Biront · Jun 9, 2017

Hi,

I'm having a weird behavior on my SSD pool, and I'm not sure where the issue is at. It came into my attention that I had latency spikes only on my SSD pool which serves as an iSCSI target for my VMware Cluster.

First of all, my hardware configuration.
DISCLAIMER: This is a home-lab. Before going crazy about "You fool will lose everything" take that into consideration. And I have backups =)

FreeNAS box:
Intel Xeon X3430
Intel ServerBoard S3420GPX
LSI 9210-8i IT mode
32GB DDR3-1333 ECC RAM
3 x 1Gbps Intel Ethernet
Silverstone 400W PSU
6 x 1TB WD Blue
2 x 500GB SK Hynix SSD

Pools:
2 x 3x1TB WD RAIDZ (compression, no deduplication)
2 x 500GB SDD Mirror (compression, deduplication)

Why dedup? SSD is not cheap. 500GB is not much. RAM should be enough. VMware Datastore Usage = 150GB. Pool actual usage = 50GB. I'm loving it.

Ok... so, what's the problem? Latency spikes. I don't know why. But I think It's related to the SSDs TRIMing constantly.
For example:

The RAID is actually "idle" (idle VMs are online), but it's constantly deleting something. It's been like that since day 1. So, I tought I should check if it was TRIMing and found:

kstat.zfs.misc.zio_trim.failed: 0
kstat.zfs.misc.zio_trim.unsupported: 1767
kstat.zfs.misc.zio_trim.success: 96464032
kstat.zfs.misc.zio_trim.bytes: 1062348283904

If I'm not mistaken... it trimmed 989GB. The pool was created 1 week ago. 1TB in 1 week... It's going to kill those SSDs.

Can anyone help me pin down what it's happening? Maybe dedup has problems with my config. Maybe TRIM is not functioning properly. Any ideas?

Thanks in advance.

EDIT:
Can I disable TRIM entirely? I need to protect the SSDs from certain death!

Forgot a crucial part... FreeNAS 9.10u4

EDIT 2:

For troubleshooting I tried to disable TRIM by setting:

sysctl -w vfs.zfs.trim.enabled=0

But it said "Tunable values are set in /boot/loader.conf"

So I edited loader.conf with that line. But it does not apply. I also created a Tunable from the GUI, but it didn't apply neither. When I query that sysctl I always get:

# sysctl vfs.zfs.trim.enabled
vfs.zfs.trim.enabled: 1

But now that I have the system cleanly booted up without iSCSI traffic (since my servers are awaiting for a manual rescan) I checked the status of both disks and they weren't issuing any delete commands, and "trim.bytes" wasn't growing.

I have ESXi 6.5 with VMFS 6, so I thought maybe the auto UNMAP operations were causing issues, so I proceed to disable UNMAP (automatic space reclamation). But the moment I power up a VM, delete operations start rising again. It's been online for less than an hour with only 1 VM powered UP and it already grew to 7GB (!)

rs225 · Jun 9, 2017

Is it possible that most of your TRIM is the initial trim that ZFS does when the pool is created?

If not, you still don't need to be worried about TRIM. In fact, you need to undo anything you have done to disable TRIM. TRIM is good for SSDs.

If you have a problem, the problem is writes. You can use zpool iostat -v pool 1 to watch your I/O activity. You may need to examine your VM to determine what it is doing that may cause writes.

The latency graph doesn't seem to agree with the numbers beneath, unless I am misunderstanding the numbers.

Andres Biront · Jun 10, 2017

No, I don't think it's the initial TRIM. It looks like the TRIM counters start from 0 every restart, and It was restarted two days ago.

So, it "deleted" 1TB of data in two days. It TRIMed 10GB of data with my ESXi hosts accessing the pool during less than 1 hour. But it did not grew a single bit since I shut them down.

Both of my pools serve as VMware Datastores, but the only one that is constantly deleting is the SSD pool.

da2 is an HDD, da3 is an SSD. After I disconnected the ESXi hosts the delete operations stoped. The TRIM counter goes constantly up. So there has to be an issue with the drives trimming.

I will have to migrate the VMs and start over...

joeschmuck · Jun 10, 2017

I guess I don't understand why you have dedup turned on for a pool that is only hosting iSCSI. I thought this was not good practice and dedup should be run inside the VM instead.

I know, this may have nothing to do with your TRIM issue but maybe it does, after all, this seems to be a clear difference between the two pools, besides the media type.

Andres Biront · Jun 10, 2017

What's the issue with dedup and iSCSI? I didn't know it wasn't a good idea. It's actually working and deduplicating according to the statistics. But it can be related.

There is no VM level deduplication. Dedup is only available in VSAN, AFAIK.

I'll migrate everything and restart without dedup, if there is no other choice.

joeschmuck · Jun 10, 2017

Do an internet search on "iscsi dedup" and you will find some data about it. I also think it's worth a shot to rebuild your SSD pool with dedup turned off so you can at least eliminate that as a cause.

Andres Biront · Jun 10, 2017

I'm halfway rebuilding. I'll report when it's finished.

I really wanted dedup on the SSDs :(

EDIT:

It's done. I'm monitoring. It looks like delete operations are down, but not on 0 as the HDDs. It may be expected behavior on SSDs, I guess. But I'll give it some time.

I was reading about dedup with iSCSI. Only issue I found was the 8k block size which will equal to greater amount of RAM usage. It's a 420GB datastore, on a 32GB RAM server, I don't really mind the RAM usage. It will never be big enough to be a problem. I can't find a real issue with dedup and iSCSI that would explain what I'm seeing on my SSD Pool.

Anyway, I'll continue to look into it.

EDIT2:

Ok. It went down but I'm not satisfied with the results.

It's "deleting" 6.25GB per hour, according to TRIM stats. 150GB per day. That's 1 full disk write per week.

There are Virtual Machines running, but are idling and this does not happen when the VMs are running on the HDD pool.

EDIT3:

The disks are new. They are 1 week old. But the Wear Level Status on SMART already grew to 2. It took me years to even hit 1 on my old 64GB SSD cache... ...great.

joeschmuck · Jun 10, 2017

You need to start terminating your VMs to see if it's any VM or a specific VM causing the issue. With luck it will be one specific VM.

Andres Biront · Jun 10, 2017

I'm giving NFS a try. But performance is... abysmal in comparison. NFS 4.1 mounts as read only (it looks like a VMware issue), so I have to mount NFS v3.

I'm copying a VM, about 50GB. It takes forever. FreeNAS CPUs are at ~15%. With iSCSI and VMware offloading the copy task, CPUs were at 80% and it copied in about a minute.

It's causing me too many headaches. I'll probably go with more HDDs drives and give these SSDs another use.

EDIT:

Task Name,Target,Details,Initiator,Start Time,Completion Time
Relocate virtual machine,LNXVCSRV01,,LAB\\Andi,6/10/2017 11:23:32 AM,6/10/2017 11:24:22 AM

From SSD Datastore to HDD Datastore. 54.77GB total. iSCSI
Time: 50 seconds
Average: 1121 MB/s

Task Name,Target,Details,Initiator,Start Time,Completion Time
Relocate virtual machine,LNXVCSRV01,,LAB\\Andi,6/10/2017 11:47:29 AM,6/10/2017 11:50:49 AM

From HDD Datastore to SSD Datastore. 54.77GB total. iSCSI
Time: 200 seconds
Average: 280 MB/s

Task Name,Target,Initiator,Queued For,Start Time,Completion Time,Server
Relocate virtual machine,LNXVCSRV01,LAB\\Andi,59 ms,6/10/2017 7:05:03 PM,6/10/2017 7:43:22 PM

From HDD Datastore to SDD Datastore. 54.77GB total. iSCSI to NFS
Time: 2299 seconds
Average: 24 MB/s

That's the vCenter Server, which caused heavy "delete" operations and Trimming on the SSD when using iSCSI. It's been running on the NFS datastore for 20 minutes, and there has been zero trimming ops.

sysctl kstat.zfs.misc.zio_trim.
kstat.zfs.misc.zio_trim.failed: 0
kstat.zfs.misc.zio_trim.unsupported: 0
kstat.zfs.misc.zio_trim.success: 0
kstat.zfs.misc.zio_trim.bytes: 0

Delete operations are at 0, just like the HDDs on iSCSI. Deduplication is enable, so it's not the culprit.

Does anyone has a SSD pool serving as a iSCSI target for ESXi servers? According to this, there has to be an issue...

joeschmuck · Jun 11, 2017

If I had a SSD on my FreeNAS (besides the boot drive), I'd go ahead and try to recreate your issue but I don't.

Good luck!

Andres Biront · Jun 11, 2017

Well it looks like an ESXI issue. After disabling trim the constant write operations became obvious.

I should have started there...

It looks like the ESXI hosts are writing something constantly when a VM is turned on. And it's not any VM in particular.

I monitored every VM, and at idle they write 7 to 30 KBytes per second. With all of my ssd VMS I can account for maybe 200 KBytes per second total. But the hosts are writing between 4 an 5 MBytes per second.

I'm not comfortable with TLC drives at that rate.

It really is a shame. I could boot storm 10 VMS almost instantly with the SSD mirror, something that would drive my HDD pool to its knees.

[emoji20]

Enviado desde mi SM-G930F mediante Tapatalk

Andres Biront · Jun 12, 2017

I'm not giving this up just yet. At least I want to understand what the ESXi servers are writing. Or why are they so... chatty.

With the info I gathered, I wouldn't recommend TLC SSDs as VMware Datastores to anyone who expect them to outlive their warranty.

So, I monitored the ESXi Datastore writes. That is, the actual VMs writing needs.

ESXi 1 Write Rate: 87 KB/s
ESXi 2 Write Rate: 53 KB/s
ESXi 3 Write Rate: 0 KB/s (0 VMs running. Datastore Mounted)

FreeNAS DA3 and DA4 statistics: 1.4 MB/s

Storage I/O Control: Disabled
Automatic Space Reclamation: Disabled
VMware HA storage heartbeat: Disabled
VMware Distributed Virtual Switch: Enabled. It updates the DVS data of every VM that is connected to the DVS and running on that Datastore in 5 minutes interval. Should not be an issue.

Next thing on the schedule:

Power on a sniffer on the iSCSI VLANs and see what the actual f*ck (pardon my french) are the servers writing.

joeschmuck · Jun 12, 2017

You would think that something like this would be well documented on the ESXi forums. Many places use SSD's on ESXi so maybe you need to make some sort of tweak to make things right.

Andres Biront · Jun 12, 2017

It looks like I'm not the only one who found out this:

https://communities.vmware.com/thread/509252

ESXi writes stuff constantly. This will trigger TRIM also, so you would have an SSD writing and deleting stuff all day.

This is a non-issue with Enterprise grade SSDs. It would be less evident if I had more SSDs in RAIDZ or several mirrors (like my HDDs, which are being subject to the same constant behavior, but the Write load is divided by 4 HDDs, so is lees evident).

I'll use the SSDs on another project. I'm not really comfortable... the wear level grew another digit.

joeschmuck · Jun 12, 2017

Glad you found some supporting document from the ESXi community. Sorry it didn't turn out the way you wanted. Just go and lay out $10,000 and buy an entire pool of enterprise grade SSD's. Well win the lottery first, it helps.

I have a SSD in my ESXi server, I'm going to check it for wear level, I hope it's not taking a beating.

EDIT: I forgot that esxi sucks for delivering SMART data about a drive. My values were meaningless.

Stux · Jun 12, 2017

Well, that thread is inconclusive. It's two people saying they have an issue.

That's it.

Andres Biront · Jun 16, 2017

That's because it is not an issue on enterprise grade hardware... where VMware lives.

ESXi writes and deletes stuff from Datastores even if there isn't a VM writing to it. I found out because of the wear level on my SSD pool grew in less than 1 week. That constant writing will void the warranty on this TLC drives in less than 2 years, and maybe render them useless. On a nearly idle home enviroment...

That constant writing that will surely degrade my SSD is just noise for an enterprise grade storage. It's not an issue.

That VMware forum post is just evidence that the ESXi behavior is normal.

ThatCdnGuy · Dec 18, 2017

Responding to an old posting because google showed it to me - anyways, I went through and hunted this out. The data (via collectd/disk latency) is recording the latency of the three disk operations - read/write/delete. On HDDs, there's no delete function - it's just another write, so will always return 0. On SSDs, there's a specific delete function that usually uses the TRIM function if it's there.

The trim counter (kstat.zfs.misc.zio_trim.bytes) is a counter of data writes saved over not using trim (ie: just plain writes like a HDD). So if you've got SSD drives, you want to see this .

So generally relax about the drive, but do note that something (ESX in this case) is doing a lot of disk accesses which could be of concern.

Also good advice is to buy larger drives of better quality (more space = more wear levelling!) if you're using VMs.

logan893 · May 2, 2018

Andres Biront said:
I'm giving NFS a try. But performance is... abysmal in comparison. NFS 4.1 mounts as read only (it looks like a VMware issue), so I have to mount NFS v3.

I'm copying a VM, about 50GB. It takes forever. FreeNAS CPUs are at ~15%. With iSCSI and VMware offloading the copy task, CPUs were at 80% and it copied in about a minute.

It's causing me too many headaches. I'll probably go with more HDDs drives and give these SSDs another use.

EDIT:

Task Name,Target,Details,Initiator,Start Time,Completion Time
Relocate virtual machine,LNXVCSRV01,,LAB\\Andi,6/10/2017 11:23:32 AM,6/10/2017 11:24:22 AM

From SSD Datastore to HDD Datastore. 54.77GB total. iSCSI
Time: 50 seconds
Average: 1121 MB/s

Task Name,Target,Details,Initiator,Start Time,Completion Time
Relocate virtual machine,LNXVCSRV01,,LAB\\Andi,6/10/2017 11:47:29 AM,6/10/2017 11:50:49 AM

From HDD Datastore to SSD Datastore. 54.77GB total. iSCSI
Time: 200 seconds
Average: 280 MB/s

Task Name,Target,Initiator,Queued For,Start Time,Completion Time,Server
Relocate virtual machine,LNXVCSRV01,LAB\\Andi,59 ms,6/10/2017 7:05:03 PM,6/10/2017 7:43:22 PM

From HDD Datastore to SDD Datastore. 54.77GB total. iSCSI to NFS
Time: 2299 seconds
Average: 24 MB/s

That's the vCenter Server, which caused heavy "delete" operations and Trimming on the SSD when using iSCSI. It's been running on the NFS datastore for 20 minutes, and there has been zero trimming ops.

sysctl kstat.zfs.misc.zio_trim.
kstat.zfs.misc.zio_trim.failed: 0
kstat.zfs.misc.zio_trim.unsupported: 0
kstat.zfs.misc.zio_trim.success: 0
kstat.zfs.misc.zio_trim.bytes: 0

Delete operations are at 0, just like the HDDs on iSCSI. Deduplication is enable, so it's not the culprit.

Does anyone has a SSD pool serving as a iSCSI target for ESXi servers? According to this, there has to be an issue...

Sorry for digging up an old thread once more.

Did you uncover what caused the writes?

Are they triggered by ESXi?
Is it the VMFS locking mechanism, e.g. ATS heartbeat or SCSI read/write?

For NFS, you can control how often it's writen (every NFS.DiskFileLockUpdateFreq seconds), and it's ~84 bytes per write. Not sure if the same is configurable for iSCSI locking. (More info: https://www.vmware.com/content/dam/...r/vmware-nfs-bestpractices-white-paper-en.pdf ) Setting this too high may cause issues (with lock recovery) if your ESXi server crashes, but I suppose it'll be fine when performing graceful reboots.

Starting with vSphere55u2, VMFS uses ATS for updating its heartbeat compared to plain scsi writes earlier.

https://kb.vmware.com/s/article/2146451

As for why NFS gives you abysmal performance, it's likely the sync writes, which are enabled by default when using NFS with VMware ESXi. If you don't care about the data integrity, turn that off, and NFS should be about as quick as iSCSI (when sync isn't explicitly forced enabled).

Important Announcement for the TrueNAS Community.

SSD Pool / TRIM missbehaving ?

Dabbler

Guru

Dabbler

Old Man

Dabbler

Old Man

Dabbler

Old Man

Dabbler

Old Man

Dabbler

Dabbler

Old Man

Dabbler

Old Man

MVP

Dabbler

Cadet

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "SSD Pool / TRIM missbehaving ?"

Similar threads