Curious L2ARC/ARC Re-Hydration Issue

ridyre

Dabbler
Joined
Oct 1, 2020
Messages
12
Hi All,

Long time lurker here running the following configuration:
Dell R720 with 256GB RAM
FreeNAS-11.3-U4.1
Dual controller 8 Gbps link to enclosure containing pool disks

I am finding an issue where I migrate virtual machine storage off of the platform and onto another (presented via iSCSI on 10 Gbps w/ Jumbo Frames end-to-end) where the device significantly drops both ARC/L2ARC cache data including neighboring data in-storage.) What is more curious is that this occurs at the end of the "copy" operation and on the final delete of the source VM files. Read latency increases significantly here for all virtual machines running on this pool as the common cache data is flushed. Sync writes are enabled. Total pool size is around 60% used and presented to the hypervisor via iSCSI.

My pool is configured with the following:
17x mirrored vDevs containing 2x 4TB in each
2x 780GB cache SSD's geared towards read-intensivity
2x ZeusRam devices configured as SLOG

Detailed below -- the dips all correspond with a virtual machine being migrated off of the storage platform and occurs immediately after the delete operation after the storage migration is completed.
1601574498280.jpeg


Thoughts or ideas here? I haven't had throughput issues, but the latency here is killer when the ARC/L2ARC take these hits, it appears that more data than I am familiar with in Oracle ZFS is released when something is deleted.
 

ridyre

Dabbler
Joined
Oct 1, 2020
Messages
12
Noteworthy I had another occurrence migrating a 1 TB disk off of this platform and saw the cache drop:
1602099060794.png


What's curious is that other virtual machines that were not exhibiting symptoms of disk latency experience severe disk latency while the cahce is being flushed, but are resolved once the system "levels off."
1602099182991.png
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Welcome to the forums. Nice rig; I've got a couple ideas here.

Obligatory - you aren't using deduplication, I assume.

Questions: What's the hypervisor type, disk format, and other relevant tunables (eg: "VMware vSphere 6.7U3, VMFS6, auto-reclamation enabled at Low rate") we're dealing with here?

What's curious is that other virtual machines that were not exhibiting symptoms of disk latency experience severe disk latency while the cahce is being flushed, but are resolved once the system "levels off."

Are these other virtual machines relying heavily on (or residing almost entirely in) L2ARC? If so, the SSDs you're using may be optimized for reads to the point where any kind of mixed I/O (including deletes) is making them choke up quite severely. What's the exact model of them, how are they connected, etc?
 

ridyre

Dabbler
Joined
Oct 1, 2020
Messages
12
Thanks for the welcome and response HoneyBadger,

Obligatory - you aren't using deduplication, I assume.
Not using deduplication here, definitely optimizing towards performance and not shy on disks provisioned for this use-case.

Questions: What's the hypervisor type, disk format, and other relevant tunables (eg: "VMware vSphere 6.7U3, VMFS6, auto-reclamation enabled at Low rate") we're dealing with here?
Currently we're using Hyper-V 2016 (lab environment representing production for a site) in a Failover-Cluster, the disks are presented via ISCSI and mapped into the system as cluster volumes formatted with NTFS and using 64K at the Cluster Volume level level.
We have enabled CSV caching (works very similar to an ARC cache) and it stores frequently accessed data for reads into RAM on each host (this is a 4 node Hyper-V cluster with HP DL 380 G9 servers with 512 GB RAM each, 2x (Intel(R) Ethernet Server Adapter X520-2) in each Hyper-V host, 1 interface on each card dedicated to iSCSI traffic, the other interface dedicated to VM network traffic.
Jumbo frames is enabled on the Nexus switches, Dell (FreeNAS Box), and the Hyper-V hosts and I am able to confirm that fragmentation is not occurring at this level with a ping -f -l 8900. I have set the recommended tunable for iSCSI traffic for TcpAckFrequency to 1 on each of the appropriate interfaces.
The storage migration copies all data within a virtual disk to the cluster volume presented from a different storage system, upon completion of copy it sends over one delta and completes the migration of the storage without an outage to the VM in question (assuming that all storage sub-systems remain healthy.) At this point it performs a simple delete operation against the related file(s) that have migrated over to the other platform.

Are these other virtual machines relying heavily on (or residing almost entirely in) L2ARC? If so, the SSDs you're using may be optimized for reads to the point where any kind of mixed I/O (including deletes) is making them choke up quite severely. What's the exact model of them, how are they connected, etc?
I would say this is definitely falling along a 75% read workload, and a 25% write workload overall. There are backups using next gen software (like Veeam, Rubrik, etc.) that include a volume filter driver that allows for CBT to occur and only read those changed blocks on the storage resulting in more efficiency. The SSD's themselves are: 2 x Toshiba PX03SNF080 - solid state drive - 800 GB - SAS 12Gb/s. They're sitting inside of the storage enclosure (JBOD) attached via HBA (dual controller 12G SAS MD3060) to the Dell box.

Thanks again HoneyBadger, this problem is one that baffles me, as the rest of the time this rig is pretty solid.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Currently we're using Hyper-V 2016 (lab environment representing production for a site) in a Failover-Cluster, the disks are presented via ISCSI and mapped into the system as cluster volumes formatted with NTFS and using 64K at the Cluster Volume level level.

I spend way more time in the VMware world, so pardon my assumption (and any VMware-isms that sneak into the posting) - but at least with this being a "lab environment" there's more opportunity for making adjustments and observing results.

We have enabled CSV caching (works very similar to an ARC cache)
This is technically duplicating some of the effort ZFS is doing (probably more effectively/efficiently) in ARC. Of course, CSV/hypervisor-level caching will be more granular at the disk level, as well as being able to cache more than just ZFS filesystems. Assuming you're not under memory pressure, you can keep this enabled; however, if you find that you're short on RAM on your hypervisors, this would be the first place I'd look to scavenge or reduce the footprint.

I would say this is definitely falling along a 75% read workload, and a 25% write workload overall.

Since it looks like this is repeatable, I'd be curious to have a look at the output of your arc_summary results in both a "steady state" as well as when your L2ARC is "under eviction notice" as well as look at the results of a gstat -dpB logged to a file (eg: gstat -dpB > /mnt/yourpool/somefile.txt - the arc_summary results can be attached here in CODE tags but the gstat results will be huge and should be zipped/attached that way.

If your drives are indeed spending too much time on BIO_DELETE (d/s and ms/d columns in gstat) then we could look at adjusting the amount of deletes permitted per txg or in queue, as well as perhaps limiting the TRIM frequency (TRIM should be hitting L2ARC devices in the FreeBSD ZFS, I believe; I don't know if the ZoL builds do it yet) if it's your SSDs that are taking up the time.

For your 4TB capacity drives, I assume you're using SAS here as well based on the dual HBA config, so something silly like SMR (shingled drives) shouldn't be popping up.
 

ridyre

Dabbler
Joined
Oct 1, 2020
Messages
12
Thanks again HoneyBadger!

I spend way more time in the VMware world, so pardon my assumption (and any VMware-isms that sneak into the posting) - but at least with this being a "lab environment" there's more opportunity for making adjustments and observing results.
Yeah, definitely not a Hyper-V shop by choice, but... internal politics and some higher ups not liking VMWare (love the non-technical getting involved in technical decisions.)

This is technically duplicating some of the effort ZFS is doing (probably more effectively/efficiently) in ARC. Of course, CSV/hypervisor-level caching will be more granular at the disk level, as well as being able to cache more than just ZFS filesystems. Assuming you're not under memory pressure, you can keep this enabled; however, if you find that you're short on RAM on your hypervisors, this would be the first place I'd look to scavenge or reduce the footprint.
We tend to dedicate around 10% of host memory to CSV cache, it even helps in our Pure SAN environments running all NVME/Flash disks. Good stuff for sure. :) These environments

Since it looks like this is repeatable, I'd be curious to have a look at the output of your arc_summary results in both a "steady state" as well as when your L2ARC is "under eviction notice" as well as look at the results of a gstat -dpB logged to a file (eg: gstat -dpB > /mnt/yourpool/somefile.txt - the arc_summary results can be attached here in CODE tags but the gstat results will be huge and should be zipped/attached that way.

Now for the requested data - this is a fun platform! Love all the tunables!

Code:
root@zfsloaner[~]# arc_summary.py
System Memory:

        0.00%   9.35    MiB Active,     0.65%   1.62    GiB Inact
        96.20%  240.03  GiB Wired,      0.00%   0       Bytes Cache
        2.74%   6.83    GiB Free,       0.40%   1.01    GiB Gap

        Real Installed:                         256.00  GiB
        Real Available:                 99.97%  255.91  GiB
        Real Managed:                   97.49%  249.50  GiB

        Logical Total:                          256.00  GiB
        Logical Used:                   96.70%  247.55  GiB
        Logical Free:                   3.30%   8.45    GiB

Kernel Memory:                                  3.56    GiB
        Data:                           98.74%  3.51    GiB
        Text:                           1.26%   45.93   MiB

Kernel Memory Map:                              249.50  GiB
        Size:                           4.57%   11.39   GiB
        Free:                           95.43%  238.11  GiB
                                                                Page:  1
------------------------------------------------------------------------

ARC Summary: (HEALTHY)
        Storage pool Version:                   5000
        Filesystem Version:                     5
        Memory Throttle Count:                  0

ARC Misc:
        Deleted:                                4.07b
        Mutex Misses:                           2.56m
        Evict Skips:                            2.56m

ARC Size:                               88.39%  219.66  GiB
        Target Size: (Adaptive)         88.06%  218.82  GiB
        Min Size (Hard Limit):          12.50%  31.06   GiB
        Max Size (High Water):          8:1     248.50  GiB

ARC Size Breakdown:
        Recently Used Cache Size:       42.08%  92.44   GiB
        Frequently Used Cache Size:     57.92%  127.22  GiB

ARC Hash Breakdown:
        Elements Max:                           118.47m
        Elements Current:               35.70%  42.30m
        Collisions:                             8.02b
        Chain Max:                              19
        Chains:                                 12.05m
                                                                Page:  2
------------------------------------------------------------------------

ARC Total accesses:                                     19.95b
        Cache Hit Ratio:                84.08%  16.78b
        Cache Miss Ratio:               15.92%  3.18b
        Actual Hit Ratio:               83.44%  16.65b

        Data Demand Efficiency:         77.82%  7.97b
        Data Prefetch Efficiency:       18.55%  1.67b

        CACHE HITS BY CACHE LIST:
          Most Recently Used:           14.38%  2.41b
          Most Frequently Used:         84.85%  14.24b
          Most Recently Used Ghost:     1.99%   334.00m
          Most Frequently Used Ghost:   1.62%   272.42m

        CACHE HITS BY DATA TYPE:
          Demand Data:                  36.96%  6.20b
          Prefetch Data:                1.85%   310.05m
          Demand Metadata:              61.11%  10.25b
          Prefetch Metadata:            0.08%   13.96m

        CACHE MISSES BY DATA TYPE:
          Demand Data:                  55.63%  1.77b
          Prefetch Data:                42.85%  1.36b
          Demand Metadata:              1.30%   41.14m
          Prefetch Metadata:            0.22%   7.01m
                                                                Page:  3
------------------------------------------------------------------------

L2 ARC Summary: (HEALTHY)
        Passed Headroom:                        16.46m
        Tried Lock Failures:                    19.79m
        IO In Progress:                         871
        Low Memory Aborts:                      2
        Free on Write:                          7.41m
        Writes While Full:                      3.78m
        R/W Clashes:                            0
        Bad Checksums:                          0
        IO Errors:                              0
        SPA Mismatch:                           117.45m

L2 ARC Size: (Adaptive)                         495.83  GiB
        Compressed:                     99.08%  491.28  GiB
        Header Size:                    0.27%   1.33    GiB

L2 ARC Evicts:
        Lock Retries:                           291.99k
        Upon Reading:                           70

L2 ARC Breakdown:                               3.18b
        Hit Ratio:                      23.20%  736.97m
        Miss Ratio:                     76.80%  2.44b
        Feeds:                                  5.57m

L2 ARC Buffer:
        Bytes Scanned:                          581.67  TiB
        Buffer Iterations:                      5.57m
        List Iterations:                        21.62m
        NULL List Iterations:                   60.19k

L2 ARC Writes:
        Writes Sent:                    100.00% 4.66m
                                                                Page:  4
------------------------------------------------------------------------

DMU Prefetch Efficiency:                        2.38b
        Hit Ratio:                      8.25%   196.00m
        Miss Ratio:                     91.75%  2.18b

                                                                Page:  5
------------------------------------------------------------------------

                                                                Page:  6
------------------------------------------------------------------------

ZFS Tunable (sysctl):
        kern.maxusers                           16714
        vm.kmem_size                            267898855424
        vm.kmem_size_scale                      1
        vm.kmem_size_min                        0
        vm.kmem_size_max                        1319413950874
        vfs.zfs.vol.immediate_write_sz          32768
        vfs.zfs.vol.unmap_sync_enabled          0
        vfs.zfs.vol.unmap_enabled               1
        vfs.zfs.vol.recursive                   0
        vfs.zfs.vol.mode                        2
        vfs.zfs.sync_pass_rewrite               2
        vfs.zfs.sync_pass_dont_compress         5
        vfs.zfs.sync_pass_deferred_free         2
        vfs.zfs.zio.dva_throttle_enabled        1
        vfs.zfs.zio.exclude_metadata            0
        vfs.zfs.zio.use_uma                     1
        vfs.zfs.zio.taskq_batch_pct             75
        vfs.zfs.zil_maxblocksize                131072
        vfs.zfs.zil_slog_bulk                   786432
        vfs.zfs.zil_nocacheflush                0
        vfs.zfs.zil_replay_disable              0
        vfs.zfs.version.zpl                     5
        vfs.zfs.version.spa                     5000
        vfs.zfs.version.acl                     1
        vfs.zfs.version.ioctl                   7
        vfs.zfs.debug                           0
        vfs.zfs.super_owner                     0
        vfs.zfs.immediate_write_sz              32768
        vfs.zfs.cache_flush_disable             0
        vfs.zfs.standard_sm_blksz               131072
        vfs.zfs.dtl_sm_blksz                    4096
        vfs.zfs.min_auto_ashift                 12
        vfs.zfs.max_auto_ashift                 13
        vfs.zfs.vdev.def_queue_depth            32
        vfs.zfs.vdev.queue_depth_pct            1000
        vfs.zfs.vdev.write_gap_limit            4096
        vfs.zfs.vdev.read_gap_limit             32768
        vfs.zfs.vdev.aggregation_limit_non_rotating131072
        vfs.zfs.vdev.aggregation_limit          1048576
        vfs.zfs.vdev.initializing_max_active    1
        vfs.zfs.vdev.initializing_min_active    1
        vfs.zfs.vdev.removal_max_active         2
        vfs.zfs.vdev.removal_min_active         1
        vfs.zfs.vdev.trim_max_active            64
        vfs.zfs.vdev.trim_min_active            1
        vfs.zfs.vdev.scrub_max_active           2
        vfs.zfs.vdev.scrub_min_active           1
        vfs.zfs.vdev.async_write_max_active     10
        vfs.zfs.vdev.async_write_min_active     1
        vfs.zfs.vdev.async_read_max_active      3
        vfs.zfs.vdev.async_read_min_active      1
        vfs.zfs.vdev.sync_write_max_active      10
        vfs.zfs.vdev.sync_write_min_active      10
        vfs.zfs.vdev.sync_read_max_active       10
        vfs.zfs.vdev.sync_read_min_active       10
        vfs.zfs.vdev.max_active                 1000
        vfs.zfs.vdev.async_write_active_max_dirty_percent60
        vfs.zfs.vdev.async_write_active_min_dirty_percent30
        vfs.zfs.vdev.mirror.non_rotating_seek_inc1
        vfs.zfs.vdev.mirror.non_rotating_inc    0
        vfs.zfs.vdev.mirror.rotating_seek_offset1048576
        vfs.zfs.vdev.mirror.rotating_seek_inc   5
        vfs.zfs.vdev.mirror.rotating_inc        0
        vfs.zfs.vdev.trim_on_init               1
        vfs.zfs.vdev.bio_delete_disable         0
        vfs.zfs.vdev.bio_flush_disable          0
        vfs.zfs.vdev.cache.bshift               16
        vfs.zfs.vdev.cache.size                 0
        vfs.zfs.vdev.cache.max                  16384
        vfs.zfs.vdev.validate_skip              0
        vfs.zfs.vdev.max_ms_shift               38
        vfs.zfs.vdev.default_ms_shift           29
        vfs.zfs.vdev.max_ms_count_limit         131072
        vfs.zfs.vdev.min_ms_count               16
        vfs.zfs.vdev.max_ms_count               200
        vfs.zfs.vdev.trim_max_pending           10000
        vfs.zfs.txg.timeout                     5
        vfs.zfs.trim.enabled                    1
        vfs.zfs.trim.max_interval               1
        vfs.zfs.trim.timeout                    30
        vfs.zfs.trim.txg_delay                  32
        vfs.zfs.space_map_ibs                   14
        vfs.zfs.spa_allocators                  4
        vfs.zfs.spa_min_slop                    134217728
        vfs.zfs.spa_slop_shift                  5
        vfs.zfs.spa_asize_inflation             24
        vfs.zfs.deadman_enabled                 1
        vfs.zfs.deadman_checktime_ms            60000
        vfs.zfs.deadman_synctime_ms             600000
        vfs.zfs.debug_flags                     0
        vfs.zfs.debugflags                      0
        vfs.zfs.recover                         0
        vfs.zfs.spa_load_verify_data            1
        vfs.zfs.spa_load_verify_metadata        1
        vfs.zfs.spa_load_verify_maxinflight     10000
        vfs.zfs.max_missing_tvds_scan           0
        vfs.zfs.max_missing_tvds_cachefile      2
        vfs.zfs.max_missing_tvds                0
        vfs.zfs.spa_load_print_vdev_tree        0
        vfs.zfs.ccw_retry_interval              300
        vfs.zfs.check_hostid                    1
        vfs.zfs.mg_fragmentation_threshold      85
        vfs.zfs.mg_noalloc_threshold            0
        vfs.zfs.condense_pct                    200
        vfs.zfs.metaslab_sm_blksz               4096
        vfs.zfs.metaslab.bias_enabled           1
        vfs.zfs.metaslab.lba_weighting_enabled  1
        vfs.zfs.metaslab.fragmentation_factor_enabled1
        vfs.zfs.metaslab.preload_enabled        1
        vfs.zfs.metaslab.preload_limit          3
        vfs.zfs.metaslab.unload_delay           8
        vfs.zfs.metaslab.load_pct               50
        vfs.zfs.metaslab.min_alloc_size         33554432
        vfs.zfs.metaslab.df_free_pct            4
        vfs.zfs.metaslab.df_alloc_threshold     131072
        vfs.zfs.metaslab.debug_unload           0
        vfs.zfs.metaslab.debug_load             0
        vfs.zfs.metaslab.fragmentation_threshold70
        vfs.zfs.metaslab.force_ganging          16777217
        vfs.zfs.free_bpobj_enabled              1
        vfs.zfs.free_max_blocks                 18446744073709551615
        vfs.zfs.zfs_scan_checkpoint_interval    7200
        vfs.zfs.zfs_scan_legacy                 0
        vfs.zfs.no_scrub_prefetch               0
        vfs.zfs.no_scrub_io                     0
        vfs.zfs.resilver_min_time_ms            3000
        vfs.zfs.free_min_time_ms                1000
        vfs.zfs.scan_min_time_ms                1000
        vfs.zfs.scan_idle                       50
        vfs.zfs.scrub_delay                     4
        vfs.zfs.resilver_delay                  2
        vfs.zfs.top_maxinflight                 32
        vfs.zfs.delay_scale                     500000
        vfs.zfs.delay_min_dirty_percent         60
        vfs.zfs.dirty_data_sync_pct             20
        vfs.zfs.dirty_data_max_percent          10
        vfs.zfs.dirty_data_max_max              4294967296
        vfs.zfs.dirty_data_max                  4294967296
        vfs.zfs.max_recordsize                  1048576
        vfs.zfs.default_ibs                     15
        vfs.zfs.default_bs                      9
        vfs.zfs.zfetch.array_rd_sz              1048576
        vfs.zfs.zfetch.max_idistance            67108864
        vfs.zfs.zfetch.max_distance             8388608
        vfs.zfs.zfetch.min_sec_reap             2
        vfs.zfs.zfetch.max_streams              8
        vfs.zfs.prefetch_disable                0
        vfs.zfs.send_holes_without_birth_time   1
        vfs.zfs.mdcomp_disable                  0
        vfs.zfs.per_txg_dirty_frees_percent     30
        vfs.zfs.nopwrite_enabled                1
        vfs.zfs.dedup.prefetch                  1
        vfs.zfs.dbuf_cache_lowater_pct          10
        vfs.zfs.dbuf_cache_hiwater_pct          10
        vfs.zfs.dbuf_metadata_cache_overflow    0
        vfs.zfs.dbuf_metadata_cache_shift       6
        vfs.zfs.dbuf_cache_shift                5
        vfs.zfs.dbuf_metadata_cache_max_bytes   4169142400
        vfs.zfs.dbuf_cache_max_bytes            8338284800
        vfs.zfs.arc_min_prescient_prefetch_ms   6
        vfs.zfs.arc_min_prefetch_ms             1
        vfs.zfs.l2c_only_size                   0
        vfs.zfs.mfu_ghost_data_esize            68436099072
        vfs.zfs.mfu_ghost_metadata_esize        15772458496
        vfs.zfs.mfu_ghost_size                  84208557568
        vfs.zfs.mfu_data_esize                  130986194432
        vfs.zfs.mfu_metadata_esize              1738931712
        vfs.zfs.mfu_size                        135426170880
        vfs.zfs.mru_ghost_data_esize            110242906112
        vfs.zfs.mru_ghost_metadata_esize        32502020608
        vfs.zfs.mru_ghost_size                  142744926720
        vfs.zfs.mru_data_esize                  78199796736
        vfs.zfs.mru_metadata_esize              1565204992
        vfs.zfs.mru_size                        92982868992
        vfs.zfs.anon_data_esize                 0
        vfs.zfs.anon_metadata_esize             0
        vfs.zfs.anon_size                       37813760
        vfs.zfs.l2arc_norw                      1
        vfs.zfs.l2arc_feed_again                1
        vfs.zfs.l2arc_noprefetch                1
        vfs.zfs.l2arc_feed_min_ms               200
        vfs.zfs.l2arc_feed_secs                 1
        vfs.zfs.l2arc_headroom                  2
        vfs.zfs.l2arc_write_boost               8388608
        vfs.zfs.l2arc_write_max                 8388608
        vfs.zfs.arc_meta_limit                  66706278400
        vfs.zfs.arc_free_target                 1393320
        vfs.zfs.arc_kmem_cache_reap_retry_ms    1000
        vfs.zfs.compressed_arc_enabled          1
        vfs.zfs.arc_grow_retry                  60
        vfs.zfs.arc_shrink_shift                7
        vfs.zfs.arc_average_blocksize           8192
        vfs.zfs.arc_no_grow_shift               5
        vfs.zfs.arc_min                         33353139200
        vfs.zfs.arc_max                         266825113600
        vfs.zfs.abd_chunk_size                  4096
        vfs.zfs.abd_scatter_enabled             1
                                                                Page:  7
------------------------------------------------------------------------


Additional post coming with the during scenario.
 

ridyre

Dabbler
Joined
Oct 1, 2020
Messages
12
During scenario:

During the event after the delete:

Code:
root@zfsloaner[~]# arc_summary.py
System Memory:

        0.00%   7.80    MiB Active,     0.65%   1.62    GiB Inact
        96.64%  241.12  GiB Wired,      0.00%   0       Bytes Cache
        2.30%   5.74    GiB Free,       0.41%   1.01    GiB Gap

        Real Installed:                         256.00  GiB
        Real Available:                 99.97%  255.91  GiB
        Real Managed:                   97.49%  249.50  GiB

        Logical Total:                          256.00  GiB
        Logical Used:                   97.13%  248.64  GiB
        Logical Free:                   2.87%   7.36    GiB

Kernel Memory:                                  3.39    GiB
        Data:                           98.68%  3.34    GiB
        Text:                           1.32%   45.93   MiB

Kernel Memory Map:                              249.50  GiB
        Size:                           4.91%   12.25   GiB
        Free:                           95.09%  237.25  GiB
                                                                Page:  1
------------------------------------------------------------------------

ARC Summary: (HEALTHY)
        Storage pool Version:                   5000
        Filesystem Version:                     5
        Memory Throttle Count:                  0

ARC Misc:
        Deleted:                                4.07b
        Mutex Misses:                           2.56m
        Evict Skips:                            2.56m

ARC Size:                               74.83%  185.95  GiB
        Target Size: (Adaptive)         88.06%  218.82  GiB
        Min Size (Hard Limit):          12.50%  31.06   GiB
        Max Size (High Water):          8:1     248.50  GiB

ARC Size Breakdown:
        Recently Used Cache Size:       42.28%  92.52   GiB
        Frequently Used Cache Size:     57.72%  126.30  GiB

ARC Hash Breakdown:
        Elements Max:                           118.47m
        Elements Current:               36.69%  43.46m
        Collisions:                             8.03b
        Chain Max:                              19
        Chains:                                 12.47m
                                                                Page:  2
------------------------------------------------------------------------

ARC Total accesses:                                     19.98b
        Cache Hit Ratio:                84.08%  16.80b
        Cache Miss Ratio:               15.92%  3.18b
        Actual Hit Ratio:               83.44%  16.67b

        Data Demand Efficiency:         77.84%  7.97b
        Data Prefetch Efficiency:       18.53%  1.67b

        CACHE HITS BY CACHE LIST:
          Most Recently Used:           14.40%  2.42b
          Most Frequently Used:         84.84%  14.25b
          Most Recently Used Ghost:     1.99%   334.10m
          Most Frequently Used Ghost:   1.62%   272.50m

        CACHE HITS BY DATA TYPE:
          Demand Data:                  36.95%  6.21b
          Prefetch Data:                1.85%   310.16m
          Demand Metadata:              61.12%  10.27b
          Prefetch Metadata:            0.08%   13.98m

        CACHE MISSES BY DATA TYPE:
          Demand Data:                  55.59%  1.77b
          Prefetch Data:                42.90%  1.36b
          Demand Metadata:              1.30%   41.21m
          Prefetch Metadata:            0.22%   7.02m
                                                                Page:  3
------------------------------------------------------------------------

L2 ARC Summary: (HEALTHY)
        Passed Headroom:                        16.49m
        Tried Lock Failures:                    19.80m
        IO In Progress:                         871
        Low Memory Aborts:                      2
        Free on Write:                          7.42m
        Writes While Full:                      3.78m
        R/W Clashes:                            0
        Bad Checksums:                          0
        IO Errors:                              0
        SPA Mismatch:                           117.45m

L2 ARC Size: (Adaptive)                         506.63  GiB
        Compressed:                     99.16%  502.37  GiB
        Header Size:                    0.31%   1.55    GiB

L2 ARC Evicts:
        Lock Retries:                           292.14k
        Upon Reading:                           70

L2 ARC Breakdown:                               3.18b
        Hit Ratio:                      23.18%  737.12m
        Miss Ratio:                     76.82%  2.44b
        Feeds:                                  5.58m

L2 ARC Buffer:
        Bytes Scanned:                          582.70  TiB
        Buffer Iterations:                      5.58m
        List Iterations:                        21.66m
        NULL List Iterations:                   60.19k

L2 ARC Writes:
        Writes Sent:                    100.00% 4.67m
                                                                Page:  4
------------------------------------------------------------------------

DMU Prefetch Efficiency:                        2.38b
        Hit Ratio:                      8.25%   196.37m
        Miss Ratio:                     91.75%  2.18b

                                                                Page:  5
------------------------------------------------------------------------

                                                                Page:  6
------------------------------------------------------------------------

ZFS Tunable (sysctl):
        kern.maxusers                           16714
        vm.kmem_size                            267898855424
        vm.kmem_size_scale                      1
        vm.kmem_size_min                        0
        vm.kmem_size_max                        1319413950874
        vfs.zfs.vol.immediate_write_sz          32768
        vfs.zfs.vol.unmap_sync_enabled          0
        vfs.zfs.vol.unmap_enabled               1
        vfs.zfs.vol.recursive                   0
        vfs.zfs.vol.mode                        2
        vfs.zfs.sync_pass_rewrite               2
        vfs.zfs.sync_pass_dont_compress         5
        vfs.zfs.sync_pass_deferred_free         2
        vfs.zfs.zio.dva_throttle_enabled        1
        vfs.zfs.zio.exclude_metadata            0
        vfs.zfs.zio.use_uma                     1
        vfs.zfs.zio.taskq_batch_pct             75
        vfs.zfs.zil_maxblocksize                131072
        vfs.zfs.zil_slog_bulk                   786432
        vfs.zfs.zil_nocacheflush                0
        vfs.zfs.zil_replay_disable              0
        vfs.zfs.version.zpl                     5
        vfs.zfs.version.spa                     5000
        vfs.zfs.version.acl                     1
        vfs.zfs.version.ioctl                   7
        vfs.zfs.debug                           0
        vfs.zfs.super_owner                     0
        vfs.zfs.immediate_write_sz              32768
        vfs.zfs.cache_flush_disable             0
        vfs.zfs.standard_sm_blksz               131072
        vfs.zfs.dtl_sm_blksz                    4096
        vfs.zfs.min_auto_ashift                 12
        vfs.zfs.max_auto_ashift                 13
        vfs.zfs.vdev.def_queue_depth            32
        vfs.zfs.vdev.queue_depth_pct            1000
        vfs.zfs.vdev.write_gap_limit            4096
        vfs.zfs.vdev.read_gap_limit             32768
        vfs.zfs.vdev.aggregation_limit_non_rotating131072
        vfs.zfs.vdev.aggregation_limit          1048576
        vfs.zfs.vdev.initializing_max_active    1
        vfs.zfs.vdev.initializing_min_active    1
        vfs.zfs.vdev.removal_max_active         2
        vfs.zfs.vdev.removal_min_active         1
        vfs.zfs.vdev.trim_max_active            64
        vfs.zfs.vdev.trim_min_active            1
        vfs.zfs.vdev.scrub_max_active           2
        vfs.zfs.vdev.scrub_min_active           1
        vfs.zfs.vdev.async_write_max_active     10
        vfs.zfs.vdev.async_write_min_active     1
        vfs.zfs.vdev.async_read_max_active      3
        vfs.zfs.vdev.async_read_min_active      1
        vfs.zfs.vdev.sync_write_max_active      10
        vfs.zfs.vdev.sync_write_min_active      10
        vfs.zfs.vdev.sync_read_max_active       10
        vfs.zfs.vdev.sync_read_min_active       10
        vfs.zfs.vdev.max_active                 1000
        vfs.zfs.vdev.async_write_active_max_dirty_percent60
        vfs.zfs.vdev.async_write_active_min_dirty_percent30
        vfs.zfs.vdev.mirror.non_rotating_seek_inc1
        vfs.zfs.vdev.mirror.non_rotating_inc    0
        vfs.zfs.vdev.mirror.rotating_seek_offset1048576
        vfs.zfs.vdev.mirror.rotating_seek_inc   5
        vfs.zfs.vdev.mirror.rotating_inc        0
        vfs.zfs.vdev.trim_on_init               1
        vfs.zfs.vdev.bio_delete_disable         0
        vfs.zfs.vdev.bio_flush_disable          0
        vfs.zfs.vdev.cache.bshift               16
        vfs.zfs.vdev.cache.size                 0
        vfs.zfs.vdev.cache.max                  16384
        vfs.zfs.vdev.validate_skip              0
        vfs.zfs.vdev.max_ms_shift               38
        vfs.zfs.vdev.default_ms_shift           29
        vfs.zfs.vdev.max_ms_count_limit         131072
        vfs.zfs.vdev.min_ms_count               16
        vfs.zfs.vdev.max_ms_count               200
        vfs.zfs.vdev.trim_max_pending           10000
        vfs.zfs.txg.timeout                     5
        vfs.zfs.trim.enabled                    1
        vfs.zfs.trim.max_interval               1
        vfs.zfs.trim.timeout                    30
        vfs.zfs.trim.txg_delay                  32
        vfs.zfs.space_map_ibs                   14
        vfs.zfs.spa_allocators                  4
        vfs.zfs.spa_min_slop                    134217728
        vfs.zfs.spa_slop_shift                  5
        vfs.zfs.spa_asize_inflation             24
        vfs.zfs.deadman_enabled                 1
        vfs.zfs.deadman_checktime_ms            60000
        vfs.zfs.deadman_synctime_ms             600000
        vfs.zfs.debug_flags                     0
        vfs.zfs.debugflags                      0
        vfs.zfs.recover                         0
        vfs.zfs.spa_load_verify_data            1
        vfs.zfs.spa_load_verify_metadata        1
        vfs.zfs.spa_load_verify_maxinflight     10000
        vfs.zfs.max_missing_tvds_scan           0
        vfs.zfs.max_missing_tvds_cachefile      2
        vfs.zfs.max_missing_tvds                0
        vfs.zfs.spa_load_print_vdev_tree        0
        vfs.zfs.ccw_retry_interval              300
        vfs.zfs.check_hostid                    1
        vfs.zfs.mg_fragmentation_threshold      85
        vfs.zfs.mg_noalloc_threshold            0
        vfs.zfs.condense_pct                    200
        vfs.zfs.metaslab_sm_blksz               4096
        vfs.zfs.metaslab.bias_enabled           1
        vfs.zfs.metaslab.lba_weighting_enabled  1
        vfs.zfs.metaslab.fragmentation_factor_enabled1
        vfs.zfs.metaslab.preload_enabled        1
        vfs.zfs.metaslab.preload_limit          3
        vfs.zfs.metaslab.unload_delay           8
        vfs.zfs.metaslab.load_pct               50
        vfs.zfs.metaslab.min_alloc_size         33554432
        vfs.zfs.metaslab.df_free_pct            4
        vfs.zfs.metaslab.df_alloc_threshold     131072
        vfs.zfs.metaslab.debug_unload           0
        vfs.zfs.metaslab.debug_load             0
        vfs.zfs.metaslab.fragmentation_threshold70
        vfs.zfs.metaslab.force_ganging          16777217
        vfs.zfs.free_bpobj_enabled              1
        vfs.zfs.free_max_blocks                 18446744073709551615
        vfs.zfs.zfs_scan_checkpoint_interval    7200
        vfs.zfs.zfs_scan_legacy                 0
        vfs.zfs.no_scrub_prefetch               0
        vfs.zfs.no_scrub_io                     0
        vfs.zfs.resilver_min_time_ms            3000
        vfs.zfs.free_min_time_ms                1000
        vfs.zfs.scan_min_time_ms                1000
        vfs.zfs.scan_idle                       50
        vfs.zfs.scrub_delay                     4
        vfs.zfs.resilver_delay                  2
        vfs.zfs.top_maxinflight                 32
        vfs.zfs.delay_scale                     500000
        vfs.zfs.delay_min_dirty_percent         60
        vfs.zfs.dirty_data_sync_pct             20
        vfs.zfs.dirty_data_max_percent          10
        vfs.zfs.dirty_data_max_max              4294967296
        vfs.zfs.dirty_data_max                  4294967296
        vfs.zfs.max_recordsize                  1048576
        vfs.zfs.default_ibs                     15
        vfs.zfs.default_bs                      9
        vfs.zfs.zfetch.array_rd_sz              1048576
        vfs.zfs.zfetch.max_idistance            67108864
        vfs.zfs.zfetch.max_distance             8388608
        vfs.zfs.zfetch.min_sec_reap             2
        vfs.zfs.zfetch.max_streams              8
        vfs.zfs.prefetch_disable                0
        vfs.zfs.send_holes_without_birth_time   1
        vfs.zfs.mdcomp_disable                  0
        vfs.zfs.per_txg_dirty_frees_percent     30
        vfs.zfs.nopwrite_enabled                1
        vfs.zfs.dedup.prefetch                  1
        vfs.zfs.dbuf_cache_lowater_pct          10
        vfs.zfs.dbuf_cache_hiwater_pct          10
        vfs.zfs.dbuf_metadata_cache_overflow    0
        vfs.zfs.dbuf_metadata_cache_shift       6
        vfs.zfs.dbuf_cache_shift                5
        vfs.zfs.dbuf_metadata_cache_max_bytes   4169142400
        vfs.zfs.dbuf_cache_max_bytes            8338284800
        vfs.zfs.arc_min_prescient_prefetch_ms   6
        vfs.zfs.arc_min_prefetch_ms             1
        vfs.zfs.l2c_only_size                   0
        vfs.zfs.mfu_ghost_data_esize            71155318784
        vfs.zfs.mfu_ghost_metadata_esize        15555812864
        vfs.zfs.mfu_ghost_size                  86711131648
        vfs.zfs.mfu_data_esize                  105338381312
        vfs.zfs.mfu_metadata_esize              1744975360
        vfs.zfs.mfu_size                        107169764864
        vfs.zfs.mru_ghost_data_esize            116197904896
        vfs.zfs.mru_ghost_metadata_esize        32025270784
        vfs.zfs.mru_ghost_size                  148223175680
        vfs.zfs.mru_data_esize                  73167927296
        vfs.zfs.mru_metadata_esize              1949081600
        vfs.zfs.mru_size                        84323441664
        vfs.zfs.anon_data_esize                 0
        vfs.zfs.anon_metadata_esize             0
        vfs.zfs.anon_size                       14411776
        vfs.zfs.l2arc_norw                      1
        vfs.zfs.l2arc_feed_again                1
        vfs.zfs.l2arc_noprefetch                1
        vfs.zfs.l2arc_feed_min_ms               200
        vfs.zfs.l2arc_feed_secs                 1
        vfs.zfs.l2arc_headroom                  2
        vfs.zfs.l2arc_write_boost               8388608
        vfs.zfs.l2arc_write_max                 8388608
        vfs.zfs.arc_meta_limit                  66706278400
        vfs.zfs.arc_free_target                 1393320
        vfs.zfs.arc_kmem_cache_reap_retry_ms    1000
        vfs.zfs.compressed_arc_enabled          1
        vfs.zfs.arc_grow_retry                  60
        vfs.zfs.arc_shrink_shift                7
        vfs.zfs.arc_average_blocksize           8192
        vfs.zfs.arc_no_grow_shift               5
        vfs.zfs.arc_min                         33353139200
        vfs.zfs.arc_max                         266825113600
        vfs.zfs.abd_chunk_size                  4096
        vfs.zfs.abd_scatter_enabled             1
                                                                Page:  7
------------------------------------------------------------------------


Attached is the disk results summary from gstat -dBp (this contains healthy disk activity later in the file as it was a smaller migration (around 80 GB.)
 

Attachments

  • results10082020.zip
    45.5 KB · Views: 281

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Thanks for the data dumps. I'm not seeing the big drop in L2ARC size (it actually grew, from 495.83 -> 506.63) but your ARC itself claims to have dumped (219.66 -> 185.95) pretty heavily.

I wonder if your L2ARC devices are getting TRIMs at all. Can you run sysctl kstat.zfs.misc.zio_trim? If the success counter is zero, then tuning TRIM won't do anything, but if it's nonzero then we might apply a TRIM throttle to see if it's a poorly-implemented or non-queued operation there.

I'm seeing the expected burst of writes to the filesystem to update the discarded blocks there. I think what's happening is simply that the big deletes after the migration (1TB or so) are simply hogging all of the space/time of the transaction groups. I'd suggest we apply some of the following tunables in order to stretch out the deletions over a longer period of time, or "flatten the curve" as we've all heard so many times of late. (Cue the groans.)

The ones I'm looking at here are:

vfs.zfs.per_txg_dirty_frees_percent: Percentage of dirtied blocks from frees in one txg (default 30)
vfs.zfs.free_min_time_ms: Min millisecs to free per txg (default 1000)

Specifically with an eye on that last one. If you're trying to spend a minimum of 1000ms per transaction doing FREEs on your vdevs, that doesn't leave a lot of time for fast response of other writes. Cut that in half and see if you get a correlating drop in latency; but bear in mind that you can't reduce too far or you'll never complete the deletions.

Your prefetch hit rate is also pretty low (Data Prefetch Efficiency: 18.53% 1.67b) which is expected here, virtual environments tend to have effectively random access patterns. If this unit is only serving the iSCSI ZVOL for your CSV LUN, you might consider disabling it entirely so that your disks aren't potentially trying to prefetch when a request for a read comes in.
 

ridyre

Dabbler
Joined
Oct 1, 2020
Messages
12
Thanks for the review there HoneyBadger, this was definitely a smaller drop for this particular disk being removed and it's interesting to see the L2ARC grow, and the ARC to drop. But, I would say I'm usually seeing at least one (if not both) drop on larger deletions.

I wonder if your L2ARC devices are getting TRIMs at all. Can you run sysctl kstat.zfs.misc.zio_trim? If the success counter is zero, then tuning TRIM won't do anything, but if it's nonzero then we might apply a TRIM throttle to see if it's a poorly-implemented or non-queued operation there.

Please see trim output below -- looks like a number of operations are performed successfully and also a larger number unsupported:

Code:
root@zfsloaner[~]# sysctl kstat.zfs.misc.zio_trim
kstat.zfs.misc.zio_trim.failed: 0
kstat.zfs.misc.zio_trim.unsupported: 948
kstat.zfs.misc.zio_trim.success: 4
kstat.zfs.misc.zio_trim.bytes: 3200663879680


I'd suggest we apply some of the following tunables in order to stretch out the deletions over a longer period of time, or "flatten the curve" as we've all heard so many times of late. (Cue the groans.)
Haha... yeah I definitely am with you on the groaning there. :D

vfs.zfs.per_txg_dirty_frees_percent: Percentage of dirtied blocks from frees in one txg (default 30)
vfs.zfs.free_min_time_ms: Min millisecs to free per txg (default 1000)
Thanks! I'll work on playing with these and report back what we come up with.

Your prefetch hit rate is also pretty low (Data Prefetch Efficiency: 18.53% 1.67b) which is expected here, virtual environments tend to have effectively random access patterns. If this unit is only serving the iSCSI ZVOL for your CSV LUN, you might consider disabling it entirely so that your disks aren't potentially trying to prefetch when a request for a read comes in.
Interesting -- this device is indeed dedicated to serving block storage to VM hosts. I'll experiment with this as well and report back.

Thanks again for the stellar advice and insight!
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
kstat.zfs.misc.zio_trim.bytes: 3200663879680
This is 2.91T of data that was TRIMmed since your last boot; if that's your L2ARCs doing that, this could be hurting things.

Can you issue camcontrol identify daX against your Toshiba SSDs and ZeusRAM devices, and post the lines related to/around:

Code:
Data Set Management (DSM/TRIM) no


If it's only your L2ARC that reports that it's handling TRIMs, let's drop the value of vfs.zfs.vdev.trim_max_active from the default 64 - I'd start with an aggressive drop to like 16 or lower.

This could also be impacted by how aggressively Hyper-V is trying to reclaim the space. VMware throttles the deletions via the VMFS space reclamation rate - if there's no equivalent throttle in Hyper-V, it could actually be the hypervisor itself where the bottleneck is being hit. From the perspective of the hypervisor, the iSCSI LUN is a SCSI device with a fixed queue depth - and if the hypervisor has no throttle, it might be clumsily stuffing that queue with deletes.
 

ridyre

Dabbler
Joined
Oct 1, 2020
Messages
12
Thanks again for the insight HoneyBadger I did do the following -

Can you issue camcontrol identify daX against your Toshiba SSDs and ZeusRAM devices, and post the lines related to/around:

I ran this, but it appears that the command doesn't work against these SAS devices and with the "semi-equivalent" inquiry command I'm not finding anything that seems to be relevant around TRIM, even in verbose mode:

Code:
root@zfsloaner[~]# camcontrol inquiry da43
pass45: <STEC ZeusRAM C025> Fixed Direct Access SPC-4 SCSI device
pass45: Serial Number STM0001952B9
pass45: 600.000MB/s transfers, Command Queueing Enabled
root@zfsloaner[~]# camcontrol inquiry da52
pass54: <STEC ZeusRAM C025> Fixed Direct Access SPC-4 SCSI device
pass54: Serial Number STM0001952B5
pass54: 600.000MB/s transfers, Command Queueing Enabled
root@zfsloaner[~]# camcontrol inquiry da60
pass62: <STEC ZeusRAM C025> Fixed Direct Access SPC-4 SCSI device
pass62: Serial Number STM00019529B
pass62: 600.000MB/s transfers, Command Queueing Enabled
root@zfsloaner[~]# camcontrol inquiry da46
pass48: <TOSHIBA PX03SNF080 A5AC> Fixed Direct Access SPC-4 SCSI device
pass48: Serial Number 15Q0A02ET0XB
pass48: 600.000MB/s transfers, Command Queueing Enabled
root@zfsloaner[~]# camcontrol inquiry da76
pass78: <TOSHIBA PX03SNF080 A5AC> Fixed Direct Access SPC-4 SCSI device
pass78: Serial Number 15I0A061T0XB
pass78: 600.000MB/s transfers, Command Queueing Enabled
root@zfsloaner[~]# camcontrol identify da76
camcontrol: ATA ATA_IDENTIFY via pass_16 failed
camcontrol: ATA ATAPI_IDENTIFY via pass_16 failed


This could also be impacted by how aggressively Hyper-V is trying to reclaim the space. VMware throttles the deletions via the VMFS space reclamation rate - if there's no equivalent throttle in Hyper-V, it could actually be the hypervisor itself where the bottleneck is being hit. From the perspective of the hypervisor, the iSCSI LUN is a SCSI device with a fixed queue depth - and if the hypervisor has no throttle, it might be clumsily stuffing that queue with deletes.

Discussing with the team we believe that you are 100% on the right track with the Hypervisor layer being responsible. Microsoft doesn't appear to have any related documents to tuning space reclamation and it would appear that it just goes as fast is at it can without any sort of throttle.

If it's only your L2ARC that reports that it's handling TRIMs, let's drop the value of vfs.zfs.vdev.trim_max_active from the default 64 - I'd start with an aggressive drop to like 16 or lower.

I'd still be happy to do some tuning here and report back.

Thanks again for all of your help/insight HoneyBadger!
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Let's do a quick and dirty test here - issue this command on one of your lab Hyper-V hosts to disable TRIM/UNMAP entirely, migrate a machine off of it, and see if things choke again.

fsutil behaviour set DisableDeleteNotify 1
 

ridyre

Dabbler
Joined
Oct 1, 2020
Messages
12
Let's do a quick and dirty test here - issue this command on one of your lab Hyper-V hosts to disable TRIM/UNMAP entirely, migrate a machine off of it, and see if things choke again.

fsutil behaviour set DisableDeleteNotify 1

Thanks again HoneyBadger, you keep on nailing these problems right on the head. I disabled TRIM and migrated off and VOILA! there was no latency spike or issue there. I have opened a case with Microsoft support to discuss this "feature" and will report back on whether or not there is a tunable buried somewhere in the operating system to throttle.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I have opened a case with Microsoft support to discuss this "feature" and will report back on whether or not there is a tunable buried somewhere in the operating system to throttle.

Do let me know if they give you anything; but from my time reviewing it here, I haven't found a solution. Windows just seems to want to make your poor FreeNAS machine "drink from the firehose" of deletions. It seems as if Hyper-V is sending the UNMAP/TRIMs as a SCSI WRITE_SAME with a list of zeroes - which lz4 will happily compact to nothing, but it's still being parsed as "writes". (You do have compression on, right?)

One other thing I noted while reviewing your gstat output as well - your ZeusRAM devices were queried above as da43, da52, and da60 (multipathing, I assume?) - but in your gstat output I never saw those devices busy. I'm a little concerned that somehow your SLOGs are being left out of the party here.

Does zilstat give you non-zero values? Check to confirm that your iSCSI ZVOLs have sync=always set as well.
 

ridyre

Dabbler
Joined
Oct 1, 2020
Messages
12
Do let me know if they give you anything; but from my time reviewing it here, I haven't found a solution. Windows just seems to want to make your poor FreeNAS machine "drink from the firehose" of deletions. It seems as if Hyper-V is sending the UNMAP/TRIMs as a SCSI WRITE_SAME with a list of zeroes - which lz4 will happily compact to nothing, but it's still being parsed as "writes". (You do have compression on, right?)

Thanks for the responses HoneyBadger!

I'm pushing for a response from Microsoft still, their initial suggestion appears to be disable TRIM since it worked... :| I am pushing further for the value for them to either admit they don't have it, or to dig it up from some buried registry setting.

I just ran
Code:
zilstat
and I do see non-zero information here and have validated that
Code:
sync=always
is set on the ZVOL as well.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I'm pushing for a response from Microsoft still, their initial suggestion appears to be disable TRIM since it worked... :| I am pushing further for the value for them to either admit they don't have it, or to dig it up from some buried registry setting.

Hopefully they give you an answer. For the long-term, disabling TRIM will cause there to be extra overhead on the SAN, since the logical records in ZFS will never be freed, only overwritten. If there's a period of low activity where you can handle some latency, you can re-enable TRIM with the same fsutil command (using an 0 on the end) and then run Optimize-Volume -DriveLetter X -ReTrim –Verbose in PowerShell to force it. (And Ctrl-C out of it if the latency goes too crazy, and then flip the fsutil bit back to disable it.)
 

ridyre

Dabbler
Joined
Oct 1, 2020
Messages
12
Hopefully they give you an answer. For the long-term, disabling TRIM will cause there to be extra overhead on the SAN, since the logical records in ZFS will never be freed, only overwritten. If there's a period of low activity where you can handle some latency, you can re-enable TRIM with the same fsutil command (using an 0 on the end) and then run Optimize-Volume -DriveLetter X -ReTrim –Verbose in PowerShell to force it. (And Ctrl-C out of it if the latency goes too crazy, and then flip the fsutil bit back to disable it.)

Thanks so much HoneyBadger,

I have an official response from Microsoft -- the feature itself does not exist. I am pushing back to see if it is on their roadmap as their new solutions being pushed are heavily based on Hyper-V.

Microsoft Premier Support said:
I've been discussing this internally and I received the answer and its no, Hyper-V doesn't have a way to set the data rate for the TRIM or UNMAP functions. Here you have a link that clarifies how both work on Hyper-V: https://docs.microsoft.com/en-us/windows-hardware/drivers/storage/thin-provisioning.
 

ridyre

Dabbler
Joined
Oct 1, 2020
Messages
12
So as an update here, I have presented the business case and demonstrated the capabilities of the VMWare platform to Microsoft's Hyper-V development team. I'm awaiting their approval on getting this very much needed feature added to the Hyper-V platform. Thanks so much for you assistance on this HoneyBadger!
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Thanks for the update. I'm curious to try reproducing this myself (since I spend very little time with Hyper-V) to see if there's a way to throttle the deletes on the FreeNAS side of thing.

One additional note is that the tunables for frees have changed quite a bit in TrueNAS 12.0 - although there's current a strange CPU usage bug affecting a few users. The new OpenZFS implementation has tunables for limit on number of TRIM operations that can be queued up against a VM, the minimum/maximum size allowed to be TRIMmed in a single operation (32K/128M) and the default percentage of frees allowed in a TXG dropped to 5% from 30%.

This might be significantly mitigated in the new release; but like I said, the CPU usage bug is a little alarming for a few users. If you do upgrade, don't upgrade your pool yet.

(And having the variable present in Hyper-V/Windows itself to tune is also valuable, so let's not discourage Microsoft from doing something to benefit the end-user for a change. ;) )
 

ridyre

Dabbler
Joined
Oct 1, 2020
Messages
12
Afternoon all, sorry to be a thread necro, but there is some traction from Microsoft on this. I'm going to be building up this environment again to provide relevant/necessary data and anticipating encountering the same issue.
Thanks for the update. I'm curious to try reproducing this myself (since I spend very little time with Hyper-V) to see if there's a way to throttle the deletes on the FreeNAS side of thing.
ReFS/NTFS deletes are likely where we're leaning here for the throttling mechanism.

(And having the variable present in Hyper-V/Windows itself to tune is also valuable, so let's not discourage Microsoft from doing something to benefit the end-user for a change. ;) )
Look forward to seeing what they come back with, experts engaged.
 
Top