New NAS for Home/Lab use

Status
Not open for further replies.

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Another item to note is this is a DELL version of S3700, but I wouldn't expect any significant differences other than needing a LSI controller to update firmware (already done).
Where did you get the unit? Is it possible that it was defective from the start?
 

tahoward

Dabbler
Joined
Jan 7, 2018
Messages
24
Where did you get the unit? Is it possible that it was defective from the start?
It was an eBay purchase. It may be possible it was defective. Intel and SMART tools unfortunately do not provide details on the lower performance.
 

tahoward

Dabbler
Joined
Jan 7, 2018
Messages
24
Been running tests on a NFS mount both locally and across 10GbE. Reads almost always saturate 10GbE but writes seem to depend heavily on file size; starting around 500MB/s and working up to 1GB/s when file size reaches 64GB:
writer_report-16MB_record_size.png
writer_report-64kB_record_size.png

Curious if there's a way to help close the gap and reach higher transfer rates with smaller files on a NFS mount over 10GbE.

Here are the current tunables:

FreeNAS 11.1
freenas_tunables.JPG

CentOS 7
Code:
# Maximum receive socket buffer size
net.core.rmem_max = 33554432

# Maximum send socket buffer size
net.core.wmem_max = 33554432

# Default receive socket buffer size
net.core.rmem_default = 16777216

# Default send socket buffer size
net.core.wmem_default = 16777216

# Minimum, initial and max TCP Receive buffer size in Bytes
net.ipv4.tcp_rmem = 4096 87380 33554432

# Minimum, initial and max buffer space allocated
net.ipv4.tcp_wmem = 4096 87380 33554432

# Maximum number of packets queued on the input side
net.core.netdev_max_backlog = 300000

net.core.optmem_max = 40960

# Auto tuning
net.ipv4.tcp_moderate_rcvbuf = 1

# Don't cache ssthresh from previous connection
net.ipv4.tcp_no_metrics_save = 1

# The Hamilton TCP (HighSpeed-TCP) algorithm is a packet loss based congestion control and is more aggressive pushing up to max bandwidth (total BDP) and favors hosts with lower TTL / VARTTL.
net.ipv4.tcp_congestion_control=htcp

# If you are using jumbo frames set this to avoid MTU black holes.
net.ipv4.tcp_mtu_probing = 1

# Disable TCP slow start on idle connections
net.ipv4.tcp_slow_start_after_idle = 0

# Turn on the tcp_window_scaling
net.ipv4.tcp_window_scaling = 1


MTU on both the 10GbE adapters is set to 9000.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Jumbo frames probably isn't a real good idea. It might seem like it except for all the downsides.

Writes tend to be slower because it is hard to allocate space, so emptier less fragmented pools will write quicker than fuller, more heavily fragmented pools. ZFS caching is a real win for read speeds, so it isn't unusual for read to far outperform write. Aside from building transaction groups, writes do not enjoy caching benefits. You can artificially inflate the size of transaction groups to go faster, but if the underlying storage cannot keep up, this causes things to hang. You can also add lots more ARC so that ZFS will be able to cache more metadata and find space to allocate more quickly. Hopefully it is obvious that adding additional vdevs is also a good idea.
 

tahoward

Dabbler
Joined
Jan 7, 2018
Messages
24
Jumbo frames probably isn't a real good idea. It might seem like it except for all the downsides.

Writes tend to be slower because it is hard to allocate space, so emptier less fragmented pools will write quicker than fuller, more heavily fragmented pools. ZFS caching is a real win for read speeds, so it isn't unusual for read to far outperform write. Aside from building transaction groups, writes do not enjoy caching benefits. You can artificially inflate the size of transaction groups to go faster, but if the underlying storage cannot keep up, this causes things to hang. You can also add lots more ARC so that ZFS will be able to cache more metadata and find space to allocate more quickly. Hopefully it is obvious that adding additional vdevs is also a good idea.

The volume performs admirably when not testing through NFS (I assume this is due to NFS defaulting to sync writes). Here's a modified graph of the 64kB record size write performance for comparison:
writer_report-64kB_record_size-local_zfs.png

ARC usage is relatively low on these iozone tests. Appears to use about 5GB tops; max available is about 64GB.

I'll look at lowering MTU.
 

tahoward

Dabbler
Joined
Jan 7, 2018
Messages
24
Found a 400GB P3700 for dirt cheap. The seller wasn't responding to my inquiries about the drive's condition but for such a great price it was worth the risk on Amazon. Drive turned out to be in terrific condition with only 30GB worth of data written to if I'm reading ' NAND Bytes Written' correctly. SMB write test over 10Gbe shows ~800MB/s with sync=always set vs S3700 pushing 200MB/s (IMO it should have been around 350-400MB/s ¯\_(ツ)_/¯). Very happy with these results so far. Going to re-run NFS tests as that's what will ultimately be used for kubernetes persistent storage...

Edit:
Some preliminary NFS results with dd:
Code:
[root@compute ~]# time dd if=/dev/zero of=/mnt/nfs-share/ddfile bs=4M count=16000
16000+0 records in
16000+0 records out
67108864000 bytes (67 GB) copied, 77.559 s, 865 MB/s

real	1m19.655s
user	0m0.010s
sys	 0m34.737s
[root@compute ~]# time dd of=/dev/zero if=/mnt/nfs-share/ddfile
131072000+0 records in
131072000+0 records out
67108864000 bytes (67 GB) copied, 51.8594 s, 1.3 GB/s

real	0m51.862s
user	0m8.101s
sys	 0m43.755s
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Driving data through many layers of code, subsystems, etc., usually means that you don't have any realistic chance of getting close to the theoretical potential of the underlying I/O devices. I know it isn't what anyone wants to hear, but it is what it is. :-/ That said, 200MB/sec sounds pretty good.
 

tahoward

Dabbler
Joined
Jan 7, 2018
Messages
24
Driving data through many layers of code, subsystems, etc., usually means that you don't have any realistic chance of getting close to the theoretical potential of the underlying I/O devices. I know it isn't what anyone wants to hear, but it is what it is. :-/ That said, 200MB/sec sounds pretty good.

Fortunately, replacing the S3700 with a P3700 alleviated slowish NFS synchronous writes. NFS Burst writes appear to hit around ~500-600MB/s. NFS Sustained writes peak at ~800-900MB/s. Any operation that's not a NFS Sync write saturates the 10Gbps link (potentially could saturate a handful of 10Gbps links if local performance is indicative of anything). Overall I'm very pleased with these results considering the budgetary constraints and amalgamation of E-bay parts.

It's been some time since an update but I finally got around to installing both nodes and switch into a cabinet. Prior to that a buddy and I added Ethernet drops in the office and living area. Rack switch acts as a LACP hub for both attached nodes and the home's router for when everyone streams all the things:

Server_Cabinet.jpg
 
Last edited:

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Did you use the Intel SSD tools to OP the P3700?

I found more consisentent perf after doing that. I also switched the p3700 to 4K sectors. Should reduce transactions by 8 fold.
 

tahoward

Dabbler
Joined
Jan 7, 2018
Messages
24
Did you use the Intel SSD tools to OP the P3700?

I found more consisentent perf after doing that. I also switched the p3700 to 4K sectors. Should reduce transactions by 8 fold.

Yes, I used Intel tools to OP the P3700 to 40GB and a quick diskinfo check shows sector size is 4Kb. As for "consistent" perf; what exactly are you describing? IE 800-900MB/s transfer speeds over 10Gbps network via NFS Sync writes regardless of the write's duration or composition of contents? What I've observed so far is long continuous writes approach the max theoretical limit while small bursts sit at about 50-60%.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Sounds fairly consistent :)

Inconsistent would be writes fluctuating up/down over time.
 
Status
Not open for further replies.
Top