Mirror Performance Question/Odd Behavior

riggieri

Dabbler
Joined
Aug 24, 2018
Messages
42
Hello Everyone

Specs for the machine listed below. I am seeing some weird mirrored vdev behavior and don't understand. I have (8) 6TB HGST NAS Drives that R/W at about 210MB/s. I set them all up in a 2x4 mirror vdev pool expecting about 800MB/s read write and 1600MB/s write read, but I only was getting 800MB/s write and 700MB/s read. The same machine is running 4TB HGST NAS drives in a 2x14 mirror vdev pool and I was able to get about 1400MB/s write and 2500MB/s read. So down the troubleshooting stage I went.

First I ran solnet array test on the 6TB drives. It reported about 200 +/- MB/s performance for all drives. I then decided to build 4 single vdev pools, and test them individually. Using dd from /dev/zero with compression off, I would get equal read and write performance, just over 220MB/s.

Why am I not getting normal double drive mirror read performance.

Specs
FreeNAS-11.2-U3
Chenbro NR 40700
SuperMicro X8
96GB ECC RAM
Chelsea T-420BT
LSI 9210i
LSI 9707e

Internal Drives (All One Pool)
36 8TB Drives in 5 RAIDz2 vdevs
6 3TB drives in RAIDz2

Expansion Chasis
Supermicro CSE847 45 Drive JBOD
All 24 Front Drives loaded with 4TB HDST NAS DRIVES
4 Additional 4TB HGST Drives on Rear Backplane
2 1TB Sandisk SSD's on Rear Backplane
8 6tb HGST NAS on rear backplane


I am sure I am doing something wrong, but how can I see true mirrored vdev performance?
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
What kind of workload is that. Some, highly random work is much better when distributed across more spindles.
 

riggieri

Dabbler
Joined
Aug 24, 2018
Messages
42
Hi Chris

The 40 Internal drives in the Chenbro chases are used mostly for static long term storage.

The Mirrored 4TB pool (2x14) is used for production video editorial. We have 3-4 workstations all on 10G and they all can R/W between 700-1000MB/s. When they are all working concurrently, we get about 2-300MB/s to each workstation. If they are all reading, even higher.

The 2 drive SSD pool is used to store FCPx libraries, which are basically database files, with tons of small files.

The 8 6TB drives I want to use as a iSCSI extent zvol, so that our main workstation can rear and write DPX frames. DPX frames do not work over SMB or AFP shares. Not random at all, very sequential, but goal is to be able to see 1600MB/s read for 4K 16 bit DPX sequence which comes in around 1200MB/s. Eventual plan is to have 40GB ethernet direct connect for just this one workstation.

I am still perplexed as why I am not getting expected mirror performance.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

riggieri

Dabbler
Joined
Aug 24, 2018
Messages
42
Chris

As that may be true, that still doesn’t get to the root of the issue.

Why am I only seeing single drive read performance on a mirrored vdev? I first I thought I could have one slow drive but breaking the 8 drives into 4 separate vdevs demonstrated that was not true.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
For you to get full theoretical read speeds, you need to have everything aligned...

All of the reads on all drives need to be either alternately consecutive or interleaved perfectly, meaning that when one drive is finished delivering the last block, the other one is ready to deliver the next and so-on. All it takes to bring you to single drive performance is a lack of this perfect alignment (seeking instead of reading at the needed moment).

Obviously the onboard cache can come into play on the drives to smooth this out, but in a longer operation, the cache is quickly taken out of the equation as it is flushed.

So how do you make sure you get the best alignment... look at the block sizes and alignment on your disks and pools and that they are suited to your file sizes. Make sure your writing of those files is sequential and that you're writing to a pool that has plenty of free (ideally un-fragmented) space.

Don't use encryption on your disks/pools.

Have a look at this document: http://open-zfs.org/wiki/Performance_tuning
 

riggieri

Dabbler
Joined
Aug 24, 2018
Messages
42
Ok, so I was able to get back onto trying to get this pool set up and ready. For some baseline testing, I decided to make a pure stripped array via the GUI. Here are my DD tests.

Code:
root@SUN:~ # dd if=/dev/zero of=/mnt/Volume04/test/ddfile3 bs=1048k count=120000
120000+0 records in
120000+0 records out
128778240000 bytes transferred in 93.333175 secs (1379769196 bytes/sec)
root@SUN:~ # dd of=/dev/zero if=/mnt/Volume04/test/ddfile3 bs=1048k count=120000
120000+0 records in
120000+0 records out
128778240000 bytes transferred in 133.109048 secs (967464212 bytes/sec)



Now these are completely new disks and I did use the Solnet-Array script to verify each disk can ready between 190-205MB/s. What am I missing in getting ideal performance out of these 8 drives?

Here is the output for Volume04 of # zdb -U /data/zfs/zpool.cache


Code:
Volume04:
    version: 5000
    name: 'Volume04'
    state: 0
    txg: 5
    pool_guid: 1412731503474572994
    hostid: 937832228
    hostname: 'SUN.digitalcave.studio'
    com.delphix:has_per_vdev_zaps
    vdev_children: 8
    vdev_tree:
        type: 'root'
        id: 0
        guid: 1412731503474572994
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 18415077291149189558
            path: '/dev/gptid/0e5a3b9c-8218-11e9-928a-00259080c872'
            phys_path: 'id1,enc@n5003048000fce1fd/type@0/slot@11/elmdesc@Slot_17/p2'
            whole_disk: 1
            metaslab_array: 66
            metaslab_shift: 35
            ashift: 12
            asize: 5999022833664
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_leaf: 36
            com.delphix:vdev_zap_top: 37
        children[1]:
            type: 'disk'
            id: 1
            guid: 15885041709316737798
            path: '/dev/gptid/18f8b728-8218-11e9-928a-00259080c872'
            phys_path: 'id1,enc@n5003048000fce1fd/type@0/slot@6/elmdesc@Slot_06/p2'
            whole_disk: 1
            metaslab_array: 62
            metaslab_shift: 35
            ashift: 12
            asize: 5999022833664
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_leaf: 38
            com.delphix:vdev_zap_top: 39
        children[2]:
            type: 'disk'
            id: 2
            guid: 13764782820944831574
            path: '/dev/gptid/23e577ca-8218-11e9-928a-00259080c872'
            phys_path: 'id1,enc@n5003048000fce1fd/type@0/slot@3/elmdesc@Slot_03/p2'
            whole_disk: 1
            metaslab_array: 61
            metaslab_shift: 35
            ashift: 12
            asize: 5999022833664
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_leaf: 40
            com.delphix:vdev_zap_top: 41
        children[3]:
            type: 'disk'
            id: 3
            guid: 12181133074135122409
            path: '/dev/gptid/2ea0576c-8218-11e9-928a-00259080c872'
            phys_path: 'id1,enc@n5003048000fce1fd/type@0/slot@5/elmdesc@Slot_05/p2'
            whole_disk: 1
            metaslab_array: 60
            metaslab_shift: 35
            ashift: 12
            asize: 5999022833664
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_leaf: 42
            com.delphix:vdev_zap_top: 43
        children[4]:
            type: 'disk'
            id: 4
            guid: 1516060720090841112
            path: '/dev/gptid/39b246ae-8218-11e9-928a-00259080c872'
            phys_path: 'id1,enc@n5003048000fce1fd/type@0/slot@b/elmdesc@Slot_11/p2'
            whole_disk: 1
            metaslab_array: 59
            metaslab_shift: 35
            ashift: 12
            asize: 5999022833664
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_leaf: 44
            com.delphix:vdev_zap_top: 45
        children[5]:
            type: 'disk'
            id: 5
            guid: 6282077137085215506
            path: '/dev/gptid/44d4de34-8218-11e9-928a-00259080c872'
            phys_path: 'id1,enc@n5003048000fce1fd/type@0/slot@12/elmdesc@Slot_18/p2'
            whole_disk: 1
            metaslab_array: 58
            metaslab_shift: 35
            ashift: 12
            asize: 5999022833664
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_leaf: 46
            com.delphix:vdev_zap_top: 47
        children[6]:
            type: 'disk'
            id: 6
            guid: 2658387308973857441
            path: '/dev/gptid/4fbc77a1-8218-11e9-928a-00259080c872'
            phys_path: 'id1,enc@n5003048000fce1fd/type@0/slot@2/elmdesc@Slot_02/p2'
            whole_disk: 1
            metaslab_array: 57
            metaslab_shift: 35
            ashift: 12
            asize: 5999022833664
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_leaf: 48
            com.delphix:vdev_zap_top: 49
        children[7]:
            type: 'disk'
            id: 7
            guid: 14965673189236833943
            path: '/dev/gptid/5a8e8f6a-8218-11e9-928a-00259080c872'
            phys_path: 'id1,enc@n5003048000fce1fd/type@0/slot@c/elmdesc@Slot_12/p2'
            whole_disk: 1
            metaslab_array: 52
            metaslab_shift: 35
            ashift: 12
            asize: 5999022833664
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_leaf: 50
            com.delphix:vdev_zap_top: 51
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I have (8) 6TB HGST NAS Drives that R/W at about 210MB/s. I set them all up in a 2x4 mirror vdev pool expecting about 800MB/s read and 1600MB/s write, but I only was getting 800MB/s write and 700MB/s read.

I don't get the 1600MBytes/sec write.

So you have four mirror vdevs.

In a mirror, reads can be fulfilled by any component mechanism. You have two mechanisms. This means that there is possibly as much as 2 x 200MBytes/sec read capacity for each vdev. Whether it is actually possible to leverage that is questionable for a single reader, but it will happily allow two independent readers with high performance.

In a mirror, writes must be committed to all component mechanisms. This means you can never exceed around 200MBytes/sec write for each vdev.

You have four vdevs. So that would seem to be a theoretical max write speed of around 800MBytes/sec, and maybe as much as 1600MBytes/sec for reads if you have two readers.

Your reported write speed is 800MBytes/sec which seems spot on. Your reported read is 700MBytes/sec, which is a bit lower than the 800MBytes/sec that'd be nice to see for a single reader.

One of the strange things with these complex storage systems is that operations such as read are responsive/reactive in nature, and where the system can't read your mind to know what's about to happen next, usually you don't get "full" read speed out of things. Write speed is the opposite. You've already shown your intention to write by providing the data, and so you more often end up hitting the near-theoretical limit for write whereas the read is "disappointing."
 

riggieri

Dabbler
Joined
Aug 24, 2018
Messages
42
Hey @jgreco thanks for the reply. My first meesage had a typo. I was expecting ~800MB/s write and ~1600MB/s read from a 2x4 mirror setup up. I was defiantly expecting more than 1GB/sec performance from 8 drives striped together. These drives on a ATTO hardware RAID5, I saw 1200MB/s read and write.

How can I improve my read speeds without more spindles. For my application I was hoping to use, I need to hit 1.4GB/s. (4K 16bit DPX Frames on iSCSI)
 
Last edited:

riggieri

Dabbler
Joined
Aug 24, 2018
Messages
42
Ok, so I set up and let the Solnet-Array Test run over the past 24 Hours. I think I am seeing a hardware issue but I don't know how to interpret results. But the Parallel vs Serial test is defiantly showing an issue.

Code:
root@SUN:/mnt/Volume01/private # ./solnet-array-test-v2.sh
sol.net disk array test v2

1) Use all disks (from camcontrol)
2) Use selected disks (from camcontrol|grep)
3) Specify disks
4) Show camcontrol list

Option: 3

Enter disk devices separated by spaces (e.g. da1 da2): da71 da72 da73 da74 da75 da76 da77 da78

Selected disks: da71 da72 da73 da74 da75 da76 da77 da78
<ATA HGST HDN726060AL T517>        at scbus1 target 41 lun 0 (pass75,da71)
<ATA HGST HDN726060AL T517>        at scbus1 target 42 lun 0 (pass76,da72)
<ATA HGST HDN726060AL T517>        at scbus1 target 43 lun 0 (pass77,da73)
<ATA HGST HDN726060AL T517>        at scbus1 target 44 lun 0 (pass78,da74)
<ATA HGST HDN726060AL T517>        at scbus1 target 45 lun 0 (pass79,da75)
<ATA HGST HDN726060AL T517>        at scbus1 target 46 lun 0 (pass80,da76)
<ATA HGST HDN726060AL T517>        at scbus1 target 47 lun 0 (pass81,da77)
<ATA HGST HDN726060AL T517>        at scbus1 target 48 lun 0 (pass82,da78)
Is this correct? (y/N): y
Performing initial serial array read (baseline speeds)
Wed May 29 10:03:36 EDT 2019
Unable to determine disk da75 size from dmesg file (not fatal but odd!)
Wed May 29 10:21:40 EDT 2019               
Completed: initial serial array read (baseline speeds)

Array's average speed is 211.946 MB/sec per disk

Disk    Disk Size  MB/sec %ofAvg
------- ---------- ------ ------
da71     5723166MB    208     98
da72     5723166MB    217    103
da73     5723166MB    216    102
da74     5723166MB    218    103
da75           0MB    217    102
da76     5723166MB    216    102
da77     5723166MB    187     88 --SLOW--
da78     5723166MB    217    102

Performing initial parallel array read
Wed May 29 10:21:40 EDT 2019
The disk da71 appears to be 5723166 MB.       
Disk is reading at about 208 MB/sec         
This suggests that this pass may take around 459 minutes
                                            
                   Serial Parall % of
Disk    Disk Size  MB/sec MB/sec Serial
------- ---------- ------ ------ ------
da71     5723166MB    208    208    100
da72     5723166MB    217    215     99
da73     5723166MB    216    217    100
da74     5723166MB    218    216     99
da75           0MB    217    217    100
da76     5723166MB    216    216    100
da77     5723166MB    187    210    112 ++FAST++
da78     5723166MB    217    215     99

Awaiting completion: initial parallel array read
Wed May 29 20:02:07 EDT 2019
Completed: initial parallel array read

Disk's average time is 34257 seconds per disk

Disk    Bytes Transferred Seconds %ofAvg
------- ----------------- ------- ------
da71        6001175126016   34827    102
da72        6001175126016   33642     98
da73        6001175126016   34416    100
da74        6001175126016   33820     99
da75        6001175126016   34325    100
da76        6001175126016   34515    101
da77        6001175126016   34554    101
da78        6001175126016   33958     99

Performing initial parallel seek-stress array read
Wed May 29 20:02:07 EDT 2019
The disk da71 appears to be 5723166 MB.       
Disk is reading at about 81 MB/sec         
This suggests that this pass may take around 1178 minutes
                                            
                   Serial Parall % of
Disk    Disk Size  MB/sec MB/sec Serial
------- ---------- ------ ------ ------
da71     5723166MB    208     77     37
da72     5723166MB    217     79     37
da73     5723166MB    216     83     38
da74     5723166MB    218     81     37
da75           0MB    217     80     37
da76     5723166MB    216     81     38
da77     5723166MB    187     79     42
da78     5723166MB    217     84     39

Awaiting completion: initial parallel seek-stress array read
 

zvans18

Dabbler
Joined
Sep 6, 2016
Messages
23
I was expecting ~800MB/s write and ~1600MB/s read from a 2x4 mirror setup up. I was defiantly expecting more than 1GB/sec performance from 8 drives striped together.
I'm hard stuck around ~800MB/s write and ~800MB/s read from 14 (7 pairs) mixed WD Reds and HGST Deskstar NAS ¯\_(ツ)_/¯
 
Top