VMware, iSCSI, dropped connections and lockups

2twisty

Contributor
Joined
Mar 18, 2020
Messages
145
10Gbps? With nine HDD vdevs? Very, very, very sketchy. It totally depends on what you're actually expecting out of the thing. I would think you'd probably be fine at 1Gbps, except that I know I could break it if given a free hand to place a torturous workload on it. If you aren't popping surprise stressy write workloads on it that break the ZFS write throttle, 10Gbps could be fine, but I guarantee it to be breakable if you get even moderately aggressive.
Right now, the effective max input to the server would be 40GB, since we have 2 10gb links from each of our 3 servers. That said, the third server is a dev box that doesn't really do much. The SAN has 4 10gb links, so even if all 3 were pushing their max, it would only go to 40.

We could easily drop it down to 20GB max by removing 2 of the 10GB links. We could further slow to a max of 10GB by turning off round robin on the VM hosts, so that only one 10GB nic is used at a time, reserving the secondary as failover.

If we slow to 10GB, I think we can squeak by on 9 vdevs because our workloads aren't that intensive under normal circumstances. In fact, the setup we have now works as long as I don't start a huge file copy or a live vMotion.

I know the write throttle for ZFS can be tuned; if we were to make adjustments to that, what would you suggest? I think the default is 60% before it starts throttlig. Would lowering to 40 have a any negative consequences? Is there a way in TrueNAS to traffic shape to slow the packet traffic down? We have 10G dell switches that all these connect to. Not sure of the specs, but they may have some shaping capabilities, as well.

What think ye?
 

2twisty

Contributor
Joined
Mar 18, 2020
Messages
145
Also -- we are thinking of getting a used NetApp DS4246 (24-bay, 6g SAS, 2 SAS expanders, etc) and 12 more 6G SAS drives.

The existing 6 are 12G SAS. Will mixing these speeds in the pool cause a problem, aside from the slower speed of the 6GB? Will it even matter?
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
.... Or should I put all 18 drives in the shelf so they all go 6GB?
There is no harm in mixing 12G SAS and 6G SAS when you're talking about 7.2K rpm spindles.

I'm am getting the impression you're running into budget constraints or aren't properly expressing the need to management. Several well founded and purposeful recommendations have been given which you keep undercutting. DIY storage solutions can be perfectly reasonable professional deployments, but DIY doesn't translate to cheap storage.
 
Last edited:

2twisty

Contributor
Joined
Mar 18, 2020
Messages
145
So, one of the suggestions was to use 24 drives instead of 12. Is that a better solution? I was only stuck on 12 8TB drives because my vdevs are already 8tb, and adding 24 8TB drives would be WAAAAAAAAAAAAAY more storage than we need. Our existing 24TB is about 60% full, so I'd like more space so I have room to grow and keep under the 50% recommendation for block storage.

Would 24 4tb drives be a better solution? It would be twice as manny spindles, but my vdevs would be 4tb, which means I could not incorporate the existing 8tb drives into the pool. I have something else I could use them for (offsite backup server we are building). If we moved to 24 4TB drives, that would give me 48TBof storage, which would drop my utilization down to 30%.

Yes, budgets are a strain -- I am the one who built out this system a year ago when I didn't know what I didn't know. We just spent 60k on these servers and asking to spend more is painful to me. So, yes, I am looking for an economic solution -- not a "cheap" one. If that NetApp shelf will provide us with what we need, I don't see why we should more than triple our cost to have 12g SAS.

I am willing to entertain any specific configurations you suggest -- most of the discussion has been "theoretical" or "best practice" which is VERY useful, but I also have to work within a budget.

If it is IMPOSSIBLE to achieve our goals without spending a ton, fine -- I will go the more expensive route so long as I can document WHY the less expensive option won't work.
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
The NetApp shelf makes sense as an economical solution. You need to increase the IOPS of your tier 2 pool. The way to do that is to increase the vDev count as much as you can while also gaining as much free pool space as is affordable. If the two options are 12 8TB drives or 24 4TB drives I would go with the 24 4TB drives as you will have the greater IOPS. If you're going to repurpose the current 6 8TB drives then buy 30 4TB drives and windup 15 vDevs, after you migrate the data.
 

2twisty

Contributor
Joined
Mar 18, 2020
Messages
145
Thanks. That was what I was thinking. I'd get 32 drives so I could have 2 cold spares, as well.

With 15 vdevs, do you think I could take a 10G input?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I know the write throttle for ZFS can be tuned; if we were to make adjustments to that, what would you suggest? I think the default is 60% before it starts throttlig. Would lowering to 40 have a any negative consequences? Is there a way in TrueNAS to traffic shape to slow the packet traffic down? We have 10G dell switches that all these connect to. Not sure of the specs, but they may have some shaping capabilities, as well.
Don't employ any kind of traffic shaping on the Dell switches. Other than "Enable bidirectional flow control" they should be doing their best to do absolutely nothing to the packets and letting the iSCSI protocol handle pause/backoff as a result of the ZFS throttle.

And as far as tuning the throttle, you could adjust the starting point of the throttle downwards to 40% or even below that (provided it doesn't throttle the low level of day-to-day traffic) but I still suspect there is the potential for an overwhelming amount of data.

Have your deduplication table stats improved any?

With 15 vdevs, do you think I could take a 10G input?
You'll have a hell of a lot better odds than with 3, but as @jgreco says you can definitely still find a workload that would cause it to sweat, and probably an extremely tough one that would break it. Simultaneous read/write access is still going to bring the overall throughput numbers down, but the floor is going to be a lot lower as you're now talking about 5x the IOPS.

Question; what kind of hit rates are you getting out of your L2ARC? Can you dump an arc_summary.py here? You want to try to lighten the load on those disks as much as you can but we want to make sure the L2ARC is doing its job.
 

2twisty

Contributor
Joined
Mar 18, 2020
Messages
145
Question; what kind of hit rates are you getting out of your L2ARC? Can you dump an arc_summary.py here? You want to try to lighten the load on those disks as much as you can but we want to make sure the L2ARC is doing its job.

Where would I find this .py file? "find / -name arc_summary.py" gives an empty result.
 

2twisty

Contributor
Joined
Mar 18, 2020
Messages
145
Dedupe hasn't changed much, because we wanted to resolve this write speed issue before we started moving the data around to remove the dedupe.

Code:
NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
Tier1      2.77T   192G  2.58T        -         -    49%     6%  1.30x    ONLINE  /mnt
Tier2      21.8T  6.67T  15.1T        -         -    67%    30%  1.55x    ONLINE  /mnt
boot-pool   444G  4.49G   440G        -         -     0%     1%  1.00x    ONLINE  -
 

2twisty

Contributor
Joined
Mar 18, 2020
Messages
145
Arc_Summary
Code:
------------------------------------------------------------------------
ZFS Subsystem Report                            Tue Jul 27 15:41:37 2021
FreeBSD 12.2-RELEASE-p6                                    zpl version 5
Machine: locutus.acglp.com (amd64)                      spa version 5000

ARC status:                                                      HEALTHY
        Memory throttle count:                                         0

ARC size (current):                                    87.4 %  222.7 GiB
        Target size (adaptive):                        87.4 %  222.6 GiB
        Min size (hard limit):                          3.1 %    8.0 GiB
        Max size (high water):                           31:1  254.7 GiB
        Most Frequently Used (MFU) cache size:         38.3 %   80.6 GiB
        Most Recently Used (MRU) cache size:           61.7 %  129.9 GiB
        Metadata cache size (hard limit):              75.0 %  191.0 GiB
        Metadata cache size (current):                  8.4 %   16.1 GiB
        Dnode cache size (hard limit):                 10.0 %   19.1 GiB
        Dnode cache size (current):                     0.2 %   37.9 MiB

ARC hash breakdown:
        Elements max:                                              81.8M
        Elements current:                              88.7 %      72.5M
        Collisions:                                               387.9M
        Chain max:                                                    15
        Chains:                                                    21.3M

ARC misc:
        Deleted:                                                  241.4M
        Mutex misses:                                              63.0k
        Eviction skips:                                             4.1k

ARC total accesses (hits + misses):                                 1.5G
        Cache hit ratio:                               84.7 %       1.3G
        Cache miss ratio:                              15.3 %     232.9M
        Actual hit ratio (MFU + MRU hits):             84.3 %       1.3G
        Data demand efficiency:                        77.9 %     347.9M
        Data prefetch efficiency:                       4.0 %     157.0M

Cache hits by cache type:
        Most frequently used (MFU):                    79.5 %       1.0G
        Most recently used (MRU):                      20.1 %     258.8M
        Most frequently used (MFU) ghost:               0.3 %       4.2M
        Most recently used (MRU) ghost:                 0.1 %       1.5M

Cache hits by data type:
        Demand data:                                   21.1 %     271.1M
        Demand prefetch data:                           0.5 %       6.2M
        Demand metadata:                               78.4 %       1.0G
        Demand prefetch metadata:                       0.1 %     705.8k

Cache misses by data type:
        Demand data:                                   33.0 %      76.8M
        Demand prefetch data:                          64.8 %     150.8M
        Demand metadata:                                1.8 %       4.3M
        Demand prefetch metadata:                       0.4 %     972.1k

DMU prefetch efficiency:                                           93.3M
        Hit ratio:                                     21.7 %      20.2M
        Miss ratio:                                    78.3 %      73.1M

L2ARC status:                                                    HEALTHY
        Low memory aborts:                                             0
        Free on write:                                             30.1k
        R/W clashes:                                                   0
        Bad checksums:                                                 0
        I/O errors:                                                    0

L2ARC size (adaptive):                                         880.6 GiB
        Compressed:                                    99.1 %  872.9 GiB
        Header size:                                    0.4 %    4.0 GiB

L2ARC breakdown:                                                  232.9M
        Hit ratio:                                      2.2 %       5.1M
        Miss ratio:                                    97.8 %     227.8M
        Feeds:                                                    477.6k

L2ARC writes:
        Writes sent:                                    100 %     337.2k

L2ARC evicts:
        Lock retries:                                               1.1k
        Upon reading:                                                  3

Tunables:
        abd_chunk_size                                              4096
        abd_scatter_enabled                                            1
        allow_redacted_dataset_mount                                   0
        anon_data_esize                                                0
        anon_metadata_esize                                            0
        anon_size                                                1032192
        arc.average_blocksize                                       8192
        arc.dnode_limit                                                0
        arc.dnode_limit_percent                                       10
        arc.dnode_reduce_percent                                      10
        arc.evict_batch_limit                                         10
        arc.eviction_pct                                             200
        arc.grow_retry                                                 0
        arc.lotsfree_percent                                          10
        arc.max                                                        0
        arc.meta_adjust_restarts                                    4096
        arc.meta_limit                                                 0
        arc.meta_limit_percent                                        75
        arc.meta_min                                                   0
        arc.meta_prune                                             10000
        arc.meta_strategy                                              1
        arc.min                                                        0
        arc.min_prefetch_ms                                            0
        arc.min_prescient_prefetch_ms                                  0
        arc.p_dampener_disable                                         1
        arc.p_min_shift                                                0
        arc.pc_percent                                                 0
        arc.shrink_shift                                               0
        arc.sys_free                                                   0
        arc_free_target                                          1391234
        arc_max                                                        0
        arc_min                                                        0
        arc_no_grow_shift                                              5
        async_block_max_blocks                      18446744073709551615
        autoimport_disable                                             1
        ccw_retry_interval                                           300
        checksum_events_per_second                                    20
        commit_timeout_pct                                             5
        compressed_arc_enabled                                         1
        condense.indirect_commit_entry_delay_ms                        0
        condense.indirect_obsolete_pct                                25
        condense.indirect_vdevs_enable                                 1
        condense.max_obsolete_bytes                           1073741824
        condense.min_mapping_bytes                                131072
        condense_pct                                                 200
        crypt_sessions                                         303874026
        dbgmsg_enable                                                  1
        dbgmsg_maxsize                                           4194304
        dbuf.cache_shift                                               5
        dbuf.metadata_cache_max_bytes               18446744073709551615
        dbuf.metadata_cache_shift                                      6
        dbuf_cache.hiwater_pct                                        10
        dbuf_cache.lowater_pct                                        10
        dbuf_cache.max_bytes                        18446744073709551615
        dbuf_state_index                                               0
        ddt_data_is_special                                            1
        deadman.checktime_ms                                       60000
        deadman.enabled                                                1
        deadman.failmode                                            wait
        deadman.synctime_ms                                       600000
        deadman.ziotime_ms                                        300000
        debug                                                          0
        debugflags                                                     0
        dedup.prefetch                                                 0
        default_bs                                                     9
        default_ibs                                                   15
        delay_min_dirty_percent                                       60
        delay_scale                                               500000
        dirty_data_max                                        4294967296
        dirty_data_max_max                                    4294967296
        dirty_data_max_max_percent                                    25
        dirty_data_max_percent                                        10
        dirty_data_sync_percent                                       20
        disable_ivset_guid_check                                       0
        dmu_object_alloc_chunk_shift                                   7
        dmu_offset_next_sync                                           0
        dmu_prefetch_max                                       134217728
        dtl_sm_blksz                                                4096
        flags                                                          0
        fletcher_4_impl [fastest] scalar superscalar superscalar4 sse2 ssse3 avx2
        free_bpobj_enabled                                             1
        free_leak_on_eio                                               0
        free_min_time_ms                                            1000
        history_output_max                                       1048576
        immediate_write_sz                                         32768
        initialize_chunk_size                                    1048576
        initialize_value                            16045690984833335022
        keep_log_spacemaps_at_export                                   0
        l2arc.feed_again                                               1
        l2arc.feed_min_ms                                            200
        l2arc.feed_secs                                                1
        l2arc.headroom                                                 2
        l2arc.headroom_boost                                         200
        l2arc.meta_percent                                            33
        l2arc.mfuonly                                                  0
        l2arc.noprefetch                                               1
        l2arc.norw                                                     0
        l2arc.rebuild_blocks_min_l2size                       1073741824
        l2arc.rebuild_enabled                                          0
        l2arc.trim_ahead                                               0
        l2arc.write_boost                                        8388608
        l2arc.write_max                                          8388608
        l2arc_feed_again                                               1
        l2arc_feed_min_ms                                            200
        l2arc_feed_secs                                                1
        l2arc_headroom                                                 2
        l2arc_noprefetch                                               1
        l2arc_norw                                                     0
        l2arc_write_boost                                        8388608
        l2arc_write_max                                          8388608
        l2c_only_size                                                  0
        livelist.condense.new_alloc                                    0
        livelist.condense.sync_cancel                                  0
        livelist.condense.sync_pause                                   0
        livelist.condense.zthr_cancel                                  0
        livelist.condense.zthr_pause                                   0
        livelist.max_entries                                      500000
        livelist.min_percent_shared                                   75
        lua.max_instrlimit                                     100000000
        lua.max_memlimit                                       104857600
        max_async_dedup_frees                                     100000
        max_auto_ashift                                               16
        max_dataset_nesting                                           50
        max_log_walking                                                5
        max_logsm_summary_length                                      10
        max_missing_tvds                                               0
        max_missing_tvds_cachefile                                     2
        max_missing_tvds_scan                                          0
        max_nvlist_src_size                                            0
        max_recordsize                                           1048576
        metaslab.aliquot                                          524288
        metaslab.bias_enabled                                          1
        metaslab.debug_load                                            0
        metaslab.debug_unload                                          0
        metaslab.df_alloc_threshold                               131072
        metaslab.df_free_pct                                           4
        metaslab.df_max_search                                  16777216
        metaslab.df_use_largest_segment                                0
        metaslab.force_ganging                                  16777217
        metaslab.fragmentation_factor_enabled                          1
        metaslab.fragmentation_threshold                              70
        metaslab.lba_weighting_enabled                                 1
        metaslab.load_pct                                             50
        metaslab.max_size_cache_sec                                 3600
        metaslab.mem_limit                                            75
        metaslab.preload_enabled                                       1
        metaslab.preload_limit                                        10
        metaslab.segment_weight_enabled                                1
        metaslab.sm_blksz_no_log                                   16384
        metaslab.sm_blksz_with_log                                131072
        metaslab.switch_threshold                                      2
        metaslab.unload_delay                                         32
        metaslab.unload_delay_ms                                  600000
        mfu_data_esize                                       76074066432
        mfu_ghost_data_esize                                113183535104
        mfu_ghost_metadata_esize                             26762951680
        mfu_ghost_size                                      139946486784
        mfu_metadata_esize                                     817084928
        mfu_size                                             86566412288
        mg.fragmentation_threshold                                    95
        mg.noalloc_threshold                                           0
        min_auto_ashift                                                9
        min_metaslabs_to_flush                                         1
        mru_data_esize                                      133782996992
        mru_ghost_data_esize                                 83510132736
        mru_ghost_metadata_esize                             15499380224
        mru_ghost_size                                       99009512960
        mru_metadata_esize                                    1743021568
        mru_size                                            139491125760
        multihost.fail_intervals                                      10
        multihost.history                                              0
        multihost.import_intervals                                    20
        multihost.interval                                          1000
        multilist_num_sublists                                         0
        no_scrub_io                                                    0
        no_scrub_prefetch                                              0
        nocacheflush                                                   0
        nopwrite_enabled                                               1
        obsolete_min_time_ms                                         500
        pd_bytes_max                                            52428800
        per_txg_dirty_frees_percent                                    5
        prefetch.array_rd_sz                                     1048576
        prefetch.disable                                               0
        prefetch.max_distance                                    8388608
        prefetch.max_idistance                                  67108864
        prefetch.max_streams                                           8
        prefetch.min_sec_reap                                          2
        read_history                                                   0
        read_history_hits                                              0
        rebuild_max_segment                                      1048576
        reconstruct.indirect_combinations_max                       4096
        recover                                                        0
        recv.queue_ff                                                 20
        recv.queue_length                                       16777216
        recv.write_batch_size                                    1048576
        reference_tracking_enable                                      0
        removal_suspend_progress                                       0
        remove_max_segment                                      16777216
        resilver_disable_defer                                         0
        resilver_min_time_ms                                        3000
        scan_checkpoint_intval                                      7200
        scan_fill_weight                                               3
        scan_ignore_errors                                             0
        scan_issue_strategy                                            0
        scan_legacy                                                    0
        scan_max_ext_gap                                         2097152
        scan_mem_lim_fact                                             20
        scan_mem_lim_soft_fact                                        20
        scan_strict_mem_lim                                            0
        scan_suspend_progress                                          0
        scan_vdev_limit                                          4194304
        scrub_min_time_ms                                           1000
        send.corrupt_data                                              0
        send.no_prefetch_queue_ff                                     20
        send.no_prefetch_queue_length                            1048576
        send.override_estimate_recordsize                              0
        send.queue_ff                                                 20
        send.queue_length                                       16777216
        send.unmodified_spill_blocks                                   1
        send_holes_without_birth_time                                  1
        slow_io_events_per_second                                     20
        spa.asize_inflation                                           24
        spa.discard_memory_limit                                16777216
        spa.load_print_vdev_tree                                       0
        spa.load_verify_data                                           1
        spa.load_verify_metadata                                       1
        spa.load_verify_shift                                          4
        spa.slop_shift                                                 5
        space_map_ibs                                                 14
        special_class_metadata_reserve_pct                            25
        standard_sm_blksz                                         131072
        super_owner                                                    0
        sync_pass_deferred_free                                        2
        sync_pass_dont_compress                                        8
        sync_pass_rewrite                                              2
        sync_taskq_batch_pct                                          75
        top_maxinflight                                             1000
        traverse_indirect_prefetch_limit                              32
        trim.extent_bytes_max                                  134217728
        trim.extent_bytes_min                                      32768
        trim.metaslab_skip                                             0
        trim.queue_limit                                              10
        trim.txg_batch                                                32
        txg.history                                                  100
        txg.timeout                                                    1
        unflushed_log_block_max                                   262144
        unflushed_log_block_min                                     1000
        unflushed_log_block_pct                                      400
        unflushed_max_mem_amt                                 1073741824
        unflushed_max_mem_ppm                                       1000
        user_indirect_is_special                                       1
        validate_skip                                                  0
        vdev.aggregate_trim                                            0
        vdev.aggregation_limit                                   1048576
        vdev.aggregation_limit_non_rotating                       131072
        vdev.async_read_max_active                                     3
        vdev.async_read_min_active                                     1
        vdev.async_write_active_max_dirty_percent                     60
        vdev.async_write_active_min_dirty_percent                     30
        vdev.async_write_max_active                                    5
        vdev.async_write_min_active                                    1
        vdev.bio_delete_disable                                        0
        vdev.bio_flush_disable                                         0
        vdev.cache_bshift                                             16
        vdev.cache_max                                             16384
        vdev.cache_size                                                0
        vdev.def_queue_depth                                          32
        vdev.default_ms_count                                        200
        vdev.default_ms_shift                                         29
        vdev.file.logical_ashift                                       9
        vdev.file.physical_ashift                                      9
        vdev.initializing_max_active                                   1
        vdev.initializing_min_active                                   1
        vdev.max_active                                             1000
        vdev.max_auto_ashift                                          16
        vdev.min_auto_ashift                                           9
        vdev.min_ms_count                                             16
        vdev.mirror.non_rotating_inc                                   0
        vdev.mirror.non_rotating_seek_inc                              1
        vdev.mirror.rotating_inc                                       0
        vdev.mirror.rotating_seek_inc                                  5
        vdev.mirror.rotating_seek_offset                         1048576
        vdev.ms_count_limit                                       131072
        vdev.nia_credit                                                5
        vdev.nia_delay                                                 5
        vdev.queue_depth_pct                                        1000
        vdev.read_gap_limit                                        32768
        vdev.rebuild_max_active                                        3
        vdev.rebuild_min_active                                        1
        vdev.removal_ignore_errors                                     0
        vdev.removal_max_active                                        2
        vdev.removal_max_span                                      32768
        vdev.removal_min_active                                        1
        vdev.removal_suspend_progress                                  0
        vdev.remove_max_segment                                 16777216
        vdev.scrub_max_active                                          3
        vdev.scrub_min_active                                          1
        vdev.sync_read_max_active                                     10
        vdev.sync_read_min_active                                     10
        vdev.sync_write_max_active                                    10
        vdev.sync_write_min_active                                    10
        vdev.trim_max_active                                           2
        vdev.trim_min_active                                           1
        vdev.validate_skip                                             0
        vdev.write_gap_limit                                        4096
        version.acl                                                    1
        version.ioctl                                                 15
        version.module v2021052700-zfs_b4f504202869094d805defb3c5d7938116ba1226
        version.spa                                                 5000
        version.zpl                                                    5
        vnops.read_chunk_size                                    1048576
        vol.mode                                                       2
        vol.recursive                                                  0
        vol.unmap_enabled                                              1
        zap_iterate_prefetch                                           1
        zevent.cols                                                   80
        zevent.console                                                 0
        zevent.len_max                                               512
        zevent.retain_expire_secs                                    900
        zevent.retain_max                                           2000
        zfetch.max_distance                                      8388608
        zfetch.max_idistance                                    67108864
        zil.clean_taskq_maxalloc                                 1048576
        zil.clean_taskq_minalloc                                    1024
        zil.clean_taskq_nthr_pct                                     100
        zil.maxblocksize                                          131072
        zil.nocacheflush                                               0
        zil.replay_disable                                             0
        zil.slog_bulk                                             786432
        zio.deadman_log_all                                            0
        zio.dva_throttle_enabled                                       1
        zio.exclude_metadata                                           0
        zio.requeue_io_start_cut_in_line                               1
        zio.slow_io_ms                                             30000
        zio.taskq_batch_pct                                           80
        zio.taskq_batch_tpq                                            0
        zio.use_uma                                                    1

VDEV cache disabled, skipping section

ZIL committed transactions:                                        50.7M
        Commit requests:                                           40.0M
        Flushes to stable storage:                                 38.8M
        Transactions to SLOG storage pool:            1.8 TiB      36.8M
        Transactions to non-SLOG storage pool:       87.0 GiB       1.5M




Arcstat
Code:
    time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  size     c  avail
15:42:55    70    23     32    23   32     0    0     2    7  223G  222G   5.6G
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Code:
ARC size (current):                                    87.4 %  222.7 GiB
        Most Frequently Used (MFU) cache size:         38.3 %   80.6 GiB
        Most Recently Used (MRU) cache size:           61.7 %  129.9 GiB

Cache hits by cache type:
        Most frequently used (MFU):                    79.5 %       1.0G
        Most recently used (MRU):                      20.1 %     258.8M


Was the system rebooted somewhat recently? These numbers show a big amount of churn on ARC. Your MFU cache is only 38% of the space but it's responsible for the vast majority (nearly 80%) of your hits.

Code:
L2ARC breakdown:                                                  232.9M
        Hit ratio:                                      2.2 %       5.1M
        Miss ratio:                                    97.8 %     227.8M


And your L2ARC is just suffering. 2.2% is really low, even for a VM workload.
 

2twisty

Contributor
Joined
Mar 18, 2020
Messages
145
The server was rebooted on Saturday, as part of the diagnostic testing we were doing. Boss wanted us to perform a BIOS hardware diagnostic to rule out a hardware failure on the server. It reported no errors.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
The server was rebooted on Saturday, as part of the diagnostic testing we were doing.
That makes sense as to why your metrics are poor. Generally you want to see that MFU percentage climb up as it figures out what's really the most active data and "pins" that to ARC by having the MFU portion grow bigger.

A VM backup workload on a fresh boot tends to throw ARC for a bit of a loop if it's not well-warmed, because it isn't sure what's the most important data if everything has only been read once. Did you run a backup job by any chance?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,949
I am way late to this game and I am unsure as to whether my suggestion here has any merit as most of this discussion went woosh over my head. Also I have never dealt with anything on this scale. However I have built a few virtual infrastructures for SMB clients and I never use a virtual file server (Windows / whatever). I always use a NAS and use SMB shares direct from the NAS (The same NAS can also be providing the iSCSI / NFS resources to the virtualisation setup [but not always]). Essentially keep the virtual infrastructure as small as possible and use it to offer services, not files. Obviously with SQL, Exchange etc then these get large and have to be virtual, but file shares - nope. It means I am not beating up my virtual infrastructure, ISCSi network etc with simple SMB traffic.

From reading this thread - it seems to me that you are killing your VMWare by flooding the SMB Virtual Servers.

I also realize that that would be a big change, moving and then setting up access privileges as well as needing lots of space and time to do the work.

Just my 2p worth
 

2twisty

Contributor
Joined
Mar 18, 2020
Messages
145
Well, I no longer work for this company -- not appropriate to detail why here, but suffice it to say, we never got this fully resolved, and they are on their own now. The (ONE!) remaining (and VERY overloaded) IT Admin there has a link to this thread for when he wants (NEEDS!) to revisit this.
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
Well, I no longer work for this company -- not appropriate to detail why here, but suffice it to say, we never got this fully resolved, and they are on their own now. The (ONE!) remaining (and VERY overloaded) IT Admin there has a link to this thread for when he wants (NEEDS!) to revisit this.
Jesus! Well... Here's to bigger and better things, best of luck to your future endeavors. :smile:
 
Top