Hello. First time poster here on this forum. I've been using freenas on and off for quite a few years and moved my home storage array to freenas full time now as my sole array.
Looking for some help with figuring out what or even if something is in fact wrong with one of my pools. If i posted in the wrong section, i apoligize in advance.
The pool in question, VM_T1_SSD, is made up of 4 mirrored vdevs. The drives are 8 intel 530 SSD's serving 2 vm hosts.
The box has Dual Intel Xeon E5645 2.4Ghz 6 core CPUs and 32GB Ram the ssd's are connected to the motherboard through an avago 9211-8i and the backplane is sas2
The vmhosts are connecting to freenas using 2 1gbe pipes configured using mpio
The pool has 49% free space, compression = lz4, dedupe = disabled, compression ratio 1.52x, pool is healthy.
cpu's are always at about 90% idle, system load almost always around 1.54, 1.30, 1.09, network nic avg's for each approimately tx 22.6M rx 4.5M
iSCSI Read avg 5.6M/min, Write 1.3M/Min. tested with and without jumbo frames.
There are a few more other disk pools which can be found in my signature if that is needed for further diagnostics. Those other pools are giving the epected latency, iops and bandwith.
The issue im having is that i noticed on the vmhosts that the datastore was experiencing latency at times in the 100-600ms range with averages about 2-4ms read and 1-4ms write. I noticed this using performance datastore and esxtop d, DAVG with high stats. DAVG being high points to the storage array as being the culprit according to all vmsodc i could find. everything else on the esx hosts is normal. during these times on the freenas box the disks were passing under 40i/o's and less than 40KBps per disk. I noticed this while using gstat in 1s, 2s, 5s, 10s , and 60s interupts. i also saw similar results with the pool using zpool iostat poolname with the same increments. With gstat i saw the per disk latency going going in the 100+/ms per SSD disk. to me this does not make sense since those disks can handle way more io's and bandwith before the latency should be an issue. I know 2-4ms is nothing tho sneeze at and hoping for sub 1ms is the unicorn but im hoping:)
Just to throw it in there, the network switch is configured to best practices and is extremely under utilized.
My first thought is more memory based on all the posts I've read. but at io/s and bandwith that low it does not make sense for it to be more cache needed. They are SSD's
Hoping someone can help point me in the right direction on further figuring out whats causing the disk latency and how to improve it.
If it helps below are my arc stats. i know the host hasent been up for too long but the stats arent much different then what it was when it was up for a month.
8:37PM up 1 day, 23:07, 1 user, load averages: 1.47, 1.21, 1.04
14.4TiB / 43.5TiB (Pool1)
48.9GiB / 7.25TiB (Pool2)
482GiB / 1.35TiB (Pool3)
432GiB / 880GiB (VM_T1_SSD)
437GiB / 1.08TiB (VM_T3_10K_HDD)
1.89GiB / 14.9GiB (freenas-boot)
26.55GiB (MRU: 6.89GiB, MFU: 19.66GiB) / 32.00GiB
Hit ratio -> 82.22% (higher is better)
Prefetch -> 27.64% (higher is better)
Hit MFU:MRU -> 65.13%:32.38% (higher ratio is better)
Hit MRU Ghost -> 2.29% (lower is better)
Hit MFU Ghost -> 1.76% (lower is better)
cant think of any more stats to provide of hand
Thanks,
Looking for some help with figuring out what or even if something is in fact wrong with one of my pools. If i posted in the wrong section, i apoligize in advance.
The pool in question, VM_T1_SSD, is made up of 4 mirrored vdevs. The drives are 8 intel 530 SSD's serving 2 vm hosts.
The box has Dual Intel Xeon E5645 2.4Ghz 6 core CPUs and 32GB Ram the ssd's are connected to the motherboard through an avago 9211-8i and the backplane is sas2
The vmhosts are connecting to freenas using 2 1gbe pipes configured using mpio
The pool has 49% free space, compression = lz4, dedupe = disabled, compression ratio 1.52x, pool is healthy.
cpu's are always at about 90% idle, system load almost always around 1.54, 1.30, 1.09, network nic avg's for each approimately tx 22.6M rx 4.5M
iSCSI Read avg 5.6M/min, Write 1.3M/Min. tested with and without jumbo frames.
There are a few more other disk pools which can be found in my signature if that is needed for further diagnostics. Those other pools are giving the epected latency, iops and bandwith.
The issue im having is that i noticed on the vmhosts that the datastore was experiencing latency at times in the 100-600ms range with averages about 2-4ms read and 1-4ms write. I noticed this using performance datastore and esxtop d, DAVG with high stats. DAVG being high points to the storage array as being the culprit according to all vmsodc i could find. everything else on the esx hosts is normal. during these times on the freenas box the disks were passing under 40i/o's and less than 40KBps per disk. I noticed this while using gstat in 1s, 2s, 5s, 10s , and 60s interupts. i also saw similar results with the pool using zpool iostat poolname with the same increments. With gstat i saw the per disk latency going going in the 100+/ms per SSD disk. to me this does not make sense since those disks can handle way more io's and bandwith before the latency should be an issue. I know 2-4ms is nothing tho sneeze at and hoping for sub 1ms is the unicorn but im hoping:)
Just to throw it in there, the network switch is configured to best practices and is extremely under utilized.
My first thought is more memory based on all the posts I've read. but at io/s and bandwith that low it does not make sense for it to be more cache needed. They are SSD's
Hoping someone can help point me in the right direction on further figuring out whats causing the disk latency and how to improve it.
If it helps below are my arc stats. i know the host hasent been up for too long but the stats arent much different then what it was when it was up for a month.
8:37PM up 1 day, 23:07, 1 user, load averages: 1.47, 1.21, 1.04
14.4TiB / 43.5TiB (Pool1)
48.9GiB / 7.25TiB (Pool2)
482GiB / 1.35TiB (Pool3)
432GiB / 880GiB (VM_T1_SSD)
437GiB / 1.08TiB (VM_T3_10K_HDD)
1.89GiB / 14.9GiB (freenas-boot)
26.55GiB (MRU: 6.89GiB, MFU: 19.66GiB) / 32.00GiB
Hit ratio -> 82.22% (higher is better)
Prefetch -> 27.64% (higher is better)
Hit MFU:MRU -> 65.13%:32.38% (higher ratio is better)
Hit MRU Ghost -> 2.29% (lower is better)
Hit MFU Ghost -> 1.76% (lower is better)
cant think of any more stats to provide of hand
Thanks,