SMB/AFP - Poor write performance, decent read

nielsen01

Cadet
Joined
Nov 12, 2018
Messages
5
Hi guys,

I have some problems with a system that was recently delivered to a client.
The overall system doesn't perform as expected, when its shared out through FreeNas unfortunately.

The system stores a lot of 4K DPX framestacks, so a lot of small files around 20-30MB. One sequence is about 50.000 frames.

Iperf test between my Macbook with a Sonnet Solo10GSFP+, and directly to the NIC is fine, showing a ~9.6Gbit connection. But when i try a AJA speed test (File by frame mode) to the system, tis about 120MB/s write, and 500MB/s read - same with BlackMagic Speed Test.

The system is a Supermicro system, 36bay headunit, and 44bay JBOD.
Its setup in 1 zpool with 5 vdev's containing 12 drives each.

Its running FreeNas 11.2 U1

The specs is a following
Supermicro X10DRH-CLN4 - 2xIntel Xeon E5v3/v4 16xDDR4 4x1GbE 7xPCIe LSISAS 3008 (IT MODE)
2x Intel Xeon E5-2620v4 - 8-Core 2.10GHz 20MB 8.0GT/s LGA2011-3
8x 32GB DDR4 2666MHz ECC Registered 2Rx4 Micron
2x Supermicro SATA3 DOM 16GB 1DWPD - Mirror Boot DOM
60x HGST Ultrastar He12 HDD - 12TB SAS3 3.5" 7k2 256MB ISE 512e
Supermicro 2-Port SFP+ Intel 10GbE LAN card PCI-e LP 82599ES DA
Fiberworks SFP+, 10G Ethernet, DDM, 300m 850nm, 4dB, MM, Intel

HBA to JBOD: Avago 9300-8e 12Gbps SATA/SAS PCIe 3.0 HBA External 2x8644 LP SGL (IT MODE)

I've tried to swap the NIC with a new one, both fibre and copper, and that didn't resolve the issue. And that makes me think its not an defect on that part, at least.

Any advice guys? All advice is appreciated, cause im kinda on a lost here..

Code:
root@freenas[~]# testparm -v
Load smb config files from /usr/local/etc/smb4.conf
Processing section "[Resto]"
Processing section "[StjernholmTest]"
Loaded services file OK.
WARNING: You have some share names that are longer than 12 characters.
These may not be accessible to some older clients.
(Eg. Windows9x, WindowsMe, and smbclient prior to Samba 3.0.)
Server role: ROLE_STANDALONE

Press enter to see a dump of your service definitions

# Global parameters
[global]
    abort shutdown script =
    add group script =
    add machine script =
    addport command =
    addprinter command =
    add share command =
    add user script =
    add user to group script =
    ads dns update = Yes
    afs token lifetime = 604800
    afs username map =
    aio max threads = 100
    algorithmic rid base = 1000
    allow dcerpc auth level connect = No
    allow dns updates = secure only
    allow insecure wide links = No
    allow nt4 crypto = No
    allow trusted domains = Yes
    allow unsafe cluster upgrade = No
    async smb echo handler = No
    auth event notification = No
    auth methods =
    auto services =
    bind interfaces only = No
    browse list = Yes
    cache directory = /var/db/samba4
    change notify = Yes
    change share command =
    check password script =
    cldap port = 389
    client ipc max protocol = default
    client ipc min protocol = default
    client ipc signing = default
    client lanman auth = No
    client ldap sasl wrapping = sign
    client max protocol = default
    client min protocol = CORE
    client NTLMv2 auth = Yes
    client plaintext auth = No
    client schannel = Auto
    client signing = default
    client use spnego principal = No
    client use spnego = Yes
    cluster addresses =
    clustering = No
    config backend = file
    config file =
    create krb5 conf = Yes
    ctdbd socket =
    ctdb locktime warn threshold = 0
    ctdb timeout = 0
    cups connection timeout = 30
    cups encrypt = No
    cups server =
    dcerpc endpoint servers = epmapper, wkssvc, rpcecho, samr, netlogon, lsarpc, drsuapi, dssetup, unixinfo, browser, eventlog6, backupkey, dnsserver
    deadtime = 15
    debug class = No
    debug hires timestamp = Yes
    debug pid = No
    debug prefix timestamp = No
    debug uid = No
    dedicated keytab file =
    default service =
    defer sharing violations = Yes
    delete group script =
    deleteprinter command =
    delete share command =
    delete user from group script =
    delete user script =
    dgram port = 138
    disable netbios = No
    disable spoolss = Yes
    dns forwarder =
    dns proxy = No
    dns update command = /usr/local/sbin/samba_dnsupdate
    domain logons = No
    domain master = Auto
    dos charset = CP437
    enable asu support = No
    enable core files = Yes
    enable privileges = Yes
    encrypt passwords = Yes
    enhanced browsing = Yes
    enumports command =
    eventlog list =
    get quota command =
    getwd cache = Yes
    guest account = nobody
    homedir map = auto.home
    host msdfs = Yes
    hostname lookups = Yes
    idmap backend = tdb
    idmap cache time = 604800
    idmap gid =
    idmap negative cache time = 120
    idmap uid =
    include system krb5 conf = Yes
    init logon delay = 100
    init logon delayed hosts =
    interfaces =
    iprint server =
    keepalive = 300
    kerberos encryption types = all
    kerberos method = default
    kernel change notify = No
    kpasswd port = 464
    krb5 port = 88
    lanman auth = No
    large readwrite = Yes
    ldap admin dn =
    ldap connection timeout = 2
    ldap debug level = 0
    ldap debug threshold = 10
    ldap delete dn = No
    ldap deref = auto
    ldap follow referral = Auto
    ldap group suffix =
    ldap idmap suffix =
    ldap machine suffix =
    ldap page size = 1000
    ldap passwd sync = no
    ldap replication sleep = 1000
    ldap server require strong auth = Yes
    ldap ssl = start tls
    ldap ssl ads = No
    ldap suffix =
    ldap timeout = 15
    ldap user suffix =
    lm announce = Yes
    lm interval = 60
    load printers = No
    local master = Yes
    lock directory = /var/lock
    lock spin time = 200
    log file =
    logging = file
    log level = 2
    log nt token command =
    logon drive =
    logon home = \\%N\%U
    logon path = \\%N\%U\profile
    logon script =
    log writeable files on exit = No
    lpq cache time = 30
    lsa over netlogon = No
    machine password timeout = 604800
    mangle prefix = 1
    mangling method = hash2
    map to guest = Bad User
    map untrusted to domain = Auto
    max disk size = 0
    max log size = 51200
    max mux = 50
    max open files = 7545368
    max smbd processes = 0
    max stat cache size = 256
    max ttl = 259200
    max wins ttl = 518400
    max xmit = 16644
    message command =
    min receivefile size = 0
    min wins ttl = 21600
    mit kdc command =
    multicast dns register = Yes
    name cache timeout = 660
    name resolve order = lmhosts wins host bcast
    nbt client socket address = 0.0.0.0
    nbt port = 137
    ncalrpc dir = /var/run/samba4/ncalrpc
    netbios aliases =
    netbios name = FREENAS
    netbios scope =
    neutralize nt4 emulation = No
    NIS homedir = No
    nmbd bind explicit broadcast = Yes
    nsupdate command = /usr/local/bin/samba-nsupdate -g
    ntlm auth = ntlmv2-only
    nt pipe support = Yes
    ntp signd socket directory = /var/run/samba4/ntp_signd
    nt status support = Yes
    null passwords = No
    obey pam restrictions = Yes
    old password allowed period = 60
    oplock break wait time = 0
    os2 driver map =
    os level = 20
    pam password change = No
    panic action = /usr/local/libexec/samba/samba-backtrace
    passdb backend = tdbsam
    passdb expand explicit = No
    passwd chat = *new*password* %n\n *new*password* %n\n *changed*
    passwd chat debug = No
    passwd chat timeout = 2
    passwd program =
    password hash gpg key ids =
    password hash userPassword schemes =
    password server = *
    perfcount module =
    pid directory = /var/run/samba4
    preferred master = Auto
    preload modules =
    printcap cache time = 750
    printcap name = /dev/null
    private dir = /var/db/samba4/private
    raw NTLMv2 auth = No
    read raw = Yes
    realm =
    registry shares = No
    reject md5 clients = No
    reject md5 servers = No
    remote announce =
    remote browse sync =
    rename user script =
    require strong key = Yes
    reset on zero vc = No
    restrict anonymous = 0
    rndc command = /usr/sbin/rndc
    root directory =
    rpc big endian = No
    rpc server dynamic port range = 49152-65535
    rpc server port = 0
    samba kcc command = /usr/local/sbin/samba_kcc
    security = USER
    server max protocol = SMB3
    server min protocol = SMB2_02
    server multi channel support = No
    server role = standalone server
    server schannel = Auto
    server services = s3fs, rpc, nbt, wrepl, ldap, cldap, kdc, drepl, winbindd, ntp_signd, kcc, dnsupdate, dns
    server signing = default
    server string = FreeNAS Server
    set primary group script =
    set quota command =
    share backend = classic
    show add printer wizard = Yes
    shutdown script =
    smb2 leases = Yes
    smb2 max credits = 8192
    smb2 max read = 8388608
    smb2 max trans = 8388608
    smb2 max write = 8388608
    smbd profiling level = off
    smb passwd file = /var/db/samba4/private/smbpasswd
    smb ports = 445 139
    socket options = TCP_NODELAY
    spn update command = /usr/local/sbin/samba_spnupdate
    stat cache = Yes
    state directory = /var/db/samba4
    svcctl list =
    syslog = 1
    syslog only = No
    template homedir = /home/%D/%U
    template shell = /bin/false
    time server = Yes
    timestamp logs = Yes
    tls cafile = tls/ca.pem
    tls certfile = tls/cert.pem
    tls crlfile =
    tls dh params file =
    tls enabled = Yes
    tls keyfile = tls/key.pem
    tls priority = NORMAL:-VERS-SSL3.0
    tls verify peer = as_strict_as_possible
    unicode = Yes
    unix charset = UTF-8
    unix extensions = Yes
    unix password sync = No
    use mmap = Yes
    username level = 0
    username map =
    username map cache time = 0
    username map script =
    usershare allow guests = No
    usershare max shares = 0
    usershare owner only = Yes
    usershare path = /var/db/samba4/usershares
    usershare prefix allow list =
    usershare prefix deny list =
    usershare template share =
    use spnego = Yes
    utmp = No
    utmp directory =
    web port = 901
    winbind cache time = 300
    winbindd socket directory = /var/run/samba4/winbindd
    winbind enum groups = No
    winbind enum users = No
    winbind expand groups = 0
    winbind max clients = 200
    winbind max domain connections = 1
    winbind nested groups = Yes
    winbind netbios alias spn = Yes
    winbind normalize names = No
    winbind nss info = template
    winbind offline logon = No
    winbind reconnect delay = 30
    winbind refresh tickets = No
    winbind request timeout = 60
    winbind rpc only = No
    winbind sealed pipes = Yes
    winbind separator = \
    winbind trusted domains only = No
    winbind use default domain = No
    wins hook =
    wins proxy = No
    wins server =
    wins support = No
    workgroup = WORKGROUP
    write raw = Yes
    wtmp directory =
    idmap config *: range = 90000001-100000000
    idmap config * : backend = tdb
    access based share enum = No
    acl allow execute always = Yes
    acl check permissions = Yes
    acl group control = No
    acl map full control = Yes
    administrative share = No
    admin users =
    afs share = No
    aio read size = 0
    aio write behind =
    aio write size = 0
    allocation roundup size = 1048576
    available = Yes
    blocking locks = Yes
    block size = 1024
    browseable = Yes
    case sensitive = Auto
    comment =
    copy =
    create mask = 0666
    csc policy = manual
    cups options =
    default case = lower
    default devmode = Yes
    delete readonly = No
    delete veto files = No
    dfree cache time = 0
    dfree command =
    directory mask = 0777
    directory name cache size = 0
    dmapi support = No
    dont descend =
    dos filemode = Yes
    dos filetime resolution = No
    dos filetimes = Yes
    durable handles = Yes
    ea support = Yes
    fake directory create times = No
    fake oplocks = No
    follow symlinks = Yes
    force create mode = 0000
    force directory mode = 0000
    force group =
    force printername = No
    force unknown acl user = No
    force user =
    fstype = NTFS
    guest ok = No
    guest only = No
    hide dot files = Yes
    hide files =
    hide special files = No
    hide unreadable = No
    hide unwriteable files = No
    hosts allow =
    hosts deny =
    include =
    inherit acls = No
    inherit owner = no
    inherit permissions = No
    invalid users =
    kernel oplocks = No
    kernel share modes = Yes
    level2 oplocks = Yes
    locking = Yes
    lppause command =
    lpq command = lpq -P'%p'
    lpresume command =
    lprm command = lprm -P'%p' %j
    magic output =
    magic script =
    mangled names = yes
    mangling char = ~
    map acl inherit = No
    map archive = Yes
    map hidden = No
    map readonly = yes
    map system = No
    max connections = 0
    max print jobs = 1000
    max reported print jobs = 0
    min print space = 0
    msdfs proxy =
    msdfs root = No
    msdfs shuffle referrals = No
    nt acl support = Yes
    ntvfs handler = unixuid, default
    oplock contention limit = 2
    oplocks = Yes
    path =
    posix locking = Yes
    postexec =
    preexec =
    preexec close = No
    preserve case = Yes
    printable = No
    print command = lpr -r -P'%p' %s
    printer name =
    printing = bsd
    printjob username = %U
    print notify backchannel = No
    profile acls = No
    queuepause command =
    queueresume command =
    read list =
    read only = Yes
    root postexec =
    root preexec =
    root preexec close = No
    short preserve case = Yes
    smb encrypt = default
    spotlight = No
    store dos attributes = Yes
    strict allocate = No
    strict locking = No
    strict rename = No
    strict sync = No
    sync always = No
    use client driver = No
    use sendfile = No
    valid users =
    veto files =
    veto oplock files =
    vfs objects =
    volume =
    wide links = No
    write cache size = 0
    write list =


[Resto]
    path = "/mnt/NaStar/Resto"
    read only = No
    veto files = /.snapshot/.windows/.mac/.zfs/
    vfs objects = zfs_space zfsacl streams_xattr
    zfsacl:acesort = dontcare
    nfs4:chown = true
    nfs4:acedup = merge
    nfs4:mode = special


[StjernholmTest]
    path = "/mnt/NaStar/StjernholmTest"
    read only = No
    veto files = /.snapshot/.windows/.mac/.zfs/
    vfs objects = zfs_space zfsacl streams_xattr
    zfsacl:acesort = dontcare
    nfs4:chown = true
    nfs4:acedup = merge
    nfs4:mode = special


EDIT: Attached AJA test
 

Attachments

  • IMG_6024.JPG
    IMG_6024.JPG
    340.8 KB · Views: 407
Last edited:

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Hi,

36 bays + 44 bays = 80
10 vDevs of 12 drives each = 120

So how do you fit 120 drives in 80 bays ?

Also, how you do it with only 60 HDs ?

Please, review your description first.... We can not help you if the math does not work in the first place...
 

nielsen01

Cadet
Joined
Nov 12, 2018
Messages
5
Were you able to narrow down the cause of this?
No, its still an issue with the system.

Hi,

36 bays + 44 bays = 80
10 vDevs of 12 drives each = 120

So how do you fit 120 drives in 80 bays ?

Also, how you do it with only 60 HDs ?

Please, review your description first.... We can not help you if the math does not work in the first place...

Sorry, might have been a bit to fast. Its 5 vdevs of 12 drives. Its only containing 60 drives due to cost for the client, and the capacity required
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Hi again,

So 5 vDevs not only do the math, but also the problem. Each of them has the speed of a single drive. So consider your pool has the speed of 5 drives and not 60. Should you have build that pool as Raid-10, you would have 30 vDevs of 2 drives, so would be in a completely different situation (and a much faster situation....)

When you do a single file, you are doing a sequential read. RaidZ2 is fast at sequential read, so you have good speed. When you do frame-by-frame, you are doing the equivalent of random access read and for that, Raid-10 is the key.

Random read on a 5 vDevs RaidZ-2 pool can not be fast for that reason.
 

nielsen01

Cadet
Joined
Nov 12, 2018
Messages
5
Hi again,

So 5 vDevs not only do the math, but also the problem. Each of them has the speed of a single drive. So consider your pool has the speed of 5 drives and not 60. Should you have build that pool as Raid-10, you would have 30 vDevs of 2 drives, so would be in a completely different situation (and a much faster situation....)

When you do a single file, you are doing a sequential read. RaidZ2 is fast at sequential read, so you have good speed. When you do frame-by-frame, you are doing the equivalent of random access read and for that, Raid-10 is the key.

Random read on a 5 vDevs RaidZ-2 pool can not be fast for that reason.

I know this setup might not be the optimal setup, but i would still expect it to perform better than this. 120 MB/s write is still a problem, and indicate that something is wrong.
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
You should test the pool speed directly with dd. If the pool speed is fine, and iperf is okay as you stated, then it's a protocol problem. You mentioned a Macbook, do you have SMB signing off among the other weirdness associated with apple's SMB?

Test the pool speed by creating a new dataset with compression off, then in the CLI change directory into the new dataset and run dd if=/dev/zero of=testfile bs=1048576 count=100k
 

nielsen01

Cadet
Joined
Nov 12, 2018
Messages
5
You should test the pool speed directly with dd. If the pool speed is fine, and iperf is okay as you stated, then it's a protocol problem. You mentioned a Macbook, do you have SMB signing off among the other weirdness associated with apple's SMB?

Test the pool speed by creating a new dataset with compression off, then in the CLI change directory into the new dataset and run dd if=/dev/zero of=testfile bs=1048576 count=100k

Thanks for the input. Im pretty also pretty convinced its a protocol issue.
SMB signing is turned off on the machines i was testing with as per https://support.apple.com/en-us/HT205926

I will perform the dd test tomorrow when i have access to the system on a new dataset, but as i remember with my previous test, it was around 2-2,5 GB/s
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
Thanks for the input. Im pretty also pretty convinced its a protocol issue.
SMB signing is turned off on the machines i was testing with as per https://support.apple.com/en-us/HT205926

I will perform the dd test tomorrow when i have access to the system on a new dataset, but as i remember with my previous test, it was around 2-2,5 GB/s
Another thing you could try is sharing the freeNAS volume via NFS to the apple clients. Unless there are windows machines connecting to the same volume and you don't want to run into conflicting file locks, NFS is perfectly viable for macOS.
 

nielsen01

Cadet
Joined
Nov 12, 2018
Messages
5
You should test the pool speed directly with dd. If the pool speed is fine, and iperf is okay as you stated, then it's a protocol problem. You mentioned a Macbook, do you have SMB signing off among the other weirdness associated with apple's SMB?

Test the pool speed by creating a new dataset with compression off, then in the CLI change directory into the new dataset and run dd if=/dev/zero of=testfile bs=1048576 count=100k

Hi again,

Just performed the test, and the output was
Code:
]# dd if=/dev/zero of=/mnt/NaStar/NoComp/testfile bs=1048576 count=100k
102400+0 records in
102400+0 records out
107374182400 bytes transferred in 81.919113 secs (1310734195 bytes/sec)

So it the filesystem itself is performing decent, as i see it.
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Hi Nielsen,

Beware here : you have 8x 32G = 256G of RAM. You test was for 107 374 182 400 bytes, so 107G. As such, it fits entirely in RAM easily. There was some commits to the drives for sure, but the test is not that relevant for measuring the speed of the drives in the pool.

This kind of test will give you a better measurement for read instead of write : you write such a gigantic file to your pool and reboot the NAS to flush the cache. After the reboot, you re-use DD to read that gigantic file and output it to /dev/null. There, 100% of the content must be from the drives. Still, in your case, the test will not be that relevant. The reason is that will be a long sequential read and that is what RaidZ2 is good for.

To force a simulation of random access, you will need multiple big files, something like 20 files about 10 Gig each. Because ZFS will try to avoid fragmentation, use a script to create all of them in parallel, trying to avoid having each of them in a single sequence. Once these files are created, reboot to flush the cache. You then use a script to DD all of them to /dev/null at once. This will be closer to random access.

As you see, testing performance in a relevant way s extremely difficult...
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
There was some commits to the drives for sure, but the test is not that relevant for measuring the speed of the drives in the pool.
What do you mean there were some commits to the drives? The test file was written entirely to the drives. I'll grant you that sequential writes are going to be faster do to the pool design, but the intent was to see if the 120MB/s bottle neck was being caused by an issue with the pool.
Beware here : you have 8x 32G = 256G of RAM. You test was for 107 374 182 400 bytes, so 107G. As such, it fits entirely in RAM easily.
This would matter if we were testing the read speed of the pool, but we were not.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Hi Nielsen,

Beware here : you have 8x 32G = 256G of RAM. You test was for 107 374 182 400 bytes, so 107G. As such, it fits entirely in RAM easily. There was some commits to the drives for sure, but the test is not that relevant for measuring the speed of the drives in the pool.

This kind of test will give you a better measurement for read instead of write : you write such a gigantic file to your pool and reboot the NAS to flush the cache. After the reboot, you re-use DD to read that gigantic file and output it to /dev/null. There, 100% of the content must be from the drives. Still, in your case, the test will not be that relevant. The reason is that will be a long sequential read and that is what RaidZ2 is good for.

To force a simulation of random access, you will need multiple big files, something like 20 files about 10 Gig each. Because ZFS will try to avoid fragmentation, use a script to create all of them in parallel, trying to avoid having each of them in a single sequence. Once these files are created, reboot to flush the cache. You then use a script to DD all of them to /dev/null at once. This will be closer to random access.

As you see, testing performance in a relevant way s extremely difficult...
Your forgetting the 5 second txg time out. 5 seconds times the rate of data received it the most you will buffer in ram*

*There are write throttles in place based on latency to prevent thrashing
 
Top