Write Speed issues

Matute · Jul 25, 2017

Hi all, I've been wandering on the forum for some hours and read a lot of posts regarding write speed, but was not able to find something to point me in the right direction.
First my build:

HP ML110 G9
Build FreeNAS-11.0-U1 (aa82cc58d)
Platform Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
Memory 24413MB (24 GB)

1 Volume made of two 3 disks mirrors. (6 drives, 3 mirrored drives striped with 3 mirrored drives... kind of RAID 10)
1 ZIL Intel® Solid-State Drive DC S3500 Series (80GB) (I have lots of sync writes)

I'm using the onboard SATA controller set to operate as a HBA. It shows on dmesg as:

Code:

ahci0: <Intel Wellsburg AHCI SATA controller> port 0x20a0-0x20a7,0x20bc-0x20bf,0x2098-0x209f,0x20b8-0x20bb,0x2040-0x205f mem 0x92c04000-0x92c047ff irq 16 at device 17.4 numa-domain 0 on pci1
ahci1: <Intel Wellsburg AHCI SATA controller> port 0x2078-0x207f,0x20ac-0x20af,0x2070-0x2077,0x20a8-0x20ab,0x2020-0x203f mem 0x92c00000-0x92c007ff irq 16 at device 31.2 numa-domain 0 on pci1

Vol status is:

Code:

  pool: FN1

 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details.

scan: scrub repaired 0 in 1h53m with 0 errors on Sun Jul 23 01:54:17 2017

config:
NAME											STATE	 READ WRITE CKSUM
FN1											 ONLINE	   0	 0	 0

  mirror-0									  ONLINE	   0	 0	 0
	gptid/cf6a3ef0-2acc-11e7-a24b-9cdc71af6490  ONLINE	   0	 0	 0
	gptid/d02daa79-2acc-11e7-a24b-9cdc71af6490  ONLINE	   0	 0	 0
	gptid/d0f9f75b-2acc-11e7-a24b-9cdc71af6490  ONLINE	   0	 0	 0

  mirror-2									  ONLINE	   0	 0	 0
	gptid/2cf4bdbd-30d9-11e7-844b-9cdc71af6490  ONLINE	   0	 0	 0
	gptid/2db5bffc-30d9-11e7-844b-9cdc71af6490  ONLINE	   0	 0	 0
	gptid/2e7952d3-30d9-11e7-844b-9cdc71af6490  ONLINE	   0	 0	 0

logs
  gptid/56b2f0ed-31dc-11e7-afb4-9cdc71af6490	ONLINE	   0	 0	 0

errors: No known data errors

As you can notice I haven't upgrade the pool since I upgraded to FreeBSD 10 just in case I had to go back, could that be affecting the performance.

Notes:
- I have (partly) discarded network issues (reading speed is fantastic, watching the port at a switch level I don't see dropped packets or collisions or anything wrong whatsoever)
- I have discarded CIFS issues, writing is also slow on a AFP share.
- I have another box which is on the same network with a different RAID config and different hardware which is performing way faster (150x faster)
- Looking at the reports I see that the ZIL device in the "Disks Operations Detail" is performing only less than 1 operation/s whereas the other box with the same drive as a ZIL is performing 100 operations/s...

I had done lots of testing prior to going into production, I tried to do everything the FreeNAS way and be respectful of every advice I saw but now I went into production I find this issue and it's worrying me...

Let me know If I missed any relevant data.

Any help would be really appreciated.
Thanks a lot
Matias

Thomas102 · Jul 25, 2017

Hi

By the indentation it looks like the logs device is not part of your pool ? it should be aligned with mirror-0 and mirror-2
Did you extend your volume when creating the log device in the GUI ?

m0nkey_ · Jul 25, 2017

It's not a ZIL, it's called a SLOG (Secondary Log). I first suggest you remove that from your pool as it's not going to do anything in your usage scenario.

I would also try running iperf just to rule out the network.

http://doc.freenas.org/11/cli.html#iperf

bigphil · Jul 25, 2017

Thomas102 said:
Hi

By the indentation it looks like the logs device is not part of your pool ? it should be aligned with mirror-0 and mirror-2
Did you extend your volume when creating the log device in the GUI ?

That output is correct and the pool indeed is configured for a SLOG device...its from "zpool status FN1" and not from the GUI.

m0nkey_ said:
I first suggest you remove that from your pool as it's not going to do anything in your usage scenario.

Where did you come up with the suggestion to remove the SLOG? He said he has lots of sync writes and the Intel SSD should serve that purpose better than using an on-pool ZIL.

Matute · Jul 25, 2017

Hi all, thanks for your replies and sorry for the delay (lunch time here)

Even though I feel pretty comfortable using console every configuration was done using the interface. Only things done on the console were the zpool status -v and the cat /var/log/dmesg.today|grep Intel I used to post specific info.
Now @bigphil and @Thomas102: The SLOG is in the Volume, no doubt about it. I may have messed with the indentation, sorry if I gave misleading information. And I did in fact execute the above mentioned command.

@m0nkey_: I have lots of sync writes on NFS (a dataset being used to backup VMs which is mounted and written to from within an ESXi host) which see huge improvements when I add a SLOG (thanks for the correction regarding my ZIL/SLOG misuse) that's why I use it. But I also have already tried to disable the device on the interface (I inmediately received a critical alert regarding the volume state, that's why I'm so sure the SLOG is in the volume) and the performance remains the same (poor performance).
I'm going to try the iperf as suggested as soon as I get back to the office.

Thanks everyone again!

Thomas102 · Jul 25, 2017

Yep, my mistake, indenting is different with a single disk. I checked with a stripped mirror and indenting is correct in your case

Good command I've found so far to check for disk performance issues without having caching in the way are:
diskinfo -t /dev/...
=>read performances of the disk /dev/xxx
zpool iostat 1
=> Then run some benchmark like iozone or copy files. It will display whats is effectively read/write to the volumes.

Matute · Jul 25, 2017

@bigphil thanks for your post, I somehow missed to see it while answering on the phone browser.
@Thomas102, I've played with both commands while copying a 1GB file (BTW at less than 100KB/s) first I've run diskinfo on every disk just to be sure nothing was wrong at the disk level. Every test was of course different and yielded different results (with some dispersion) but I didn't see anything too different among them. I'll post just one of the results in case someone with better knowledge about disks sees something bad.

Code:

/dev/ada6
512		 # sectorsize
3000592982016 # mediasize in bytes (2.7T)
5860533168  # mediasize in sectors
4096		 # stripesize
0		   # stripeoffset
5814021	 # Cylinders according to firmware.
16		  # Heads according to firmware.
63		  # Sectors according to firmware.
WD-WCC4N6KU8N7Y # Disk ident.
Not_Zoned   # Zone Mode

Seek times:
Full stroke:   250 iter in   6.559273 sec =   26.237 msec
Half stroke:   250 iter in   4.936473 sec =   19.746 msec
Quarter stroke:   500 iter in   7.341099 sec =   14.682 msec
Short forward:   400 iter in   3.169958 sec =	7.925 msec
Short backward:   400 iter in   2.653847 sec =	6.635 msec
Seq outer: 2048 iter in   0.199483 sec =	0.097 msec
Seq inner: 2048 iter in   0.126313 sec =	0.062 msec

Transfer rates:
outside:	   102400 kbytes in   0.658817 sec =   155430 kbytes/sec
middle:		102400 kbytes in   0.815040 sec =   125638 kbytes/sec
inside:		102400 kbytes in   1.316234 sec =	77798 kbytes/sec

then I've played a bit with the zpool iostat 1 and there is something strange:

Code:

				 capacity	 operations	bandwidth
pool		  alloc   free   read  write   read  write
FN1			511G  4.94T	  8	 68   561K  1009K
FN1			511G  4.94T	  0	  1	  0  30.7K
FN1			511G  4.94T	  0	216	  0  1.42M
FN1			511G  4.94T	  0	  0	  0	  0
FN1			511G  4.94T	  0	  0	  0	  0
FN1			511G  4.94T	  0	  0	  0	  0
FN1			511G  4.94T	  0	  0	  0	  0
FN1			511G  4.94T	  0	185	  0  1.15M

many seconds with no write operations...

A sustained ping from another server in the local network yielded after 10 minutes while copying the said file:

Code:

933 packets transmitted, 925 received, 0% packet loss, time 932870ms
rtt min/avg/max/mdev = 0.121/0.258/2.177/0.137 ms

But what I finally got to (it's past 11 at night down here) is this:

Code:

root@freenas1:~ # netstat -h -n -I bge0 1
			input		   bge0		   output
  packets  errs idrops	  bytes	packets  errs	  bytes colls
		98	23	 0	   136K		 81	 0	   5.4K	 0
	  652	20	 0	   949K		448	 0		27K	 0
	  463	38	 0	   665K		333	 0		21K	 0
	  298	23	 0	   426K		214	 0		13K	 0

So I think that those errors in the incoming packets might be the answer... because in the server which is working fine on a similar test I get:

Code:

root@freenas2:~ # netstat -h -n -I em0 1
			input			em0		   output
  packets  errs idrops	  bytes	packets  errs	  bytes colls
	  2.7k	 0	 0	   3.9M	   1.4k	 0		75K	 0
	  2.4k	 0	 0	   3.4M	   1.2k	 0		65K	 0
	  8.7k	 0	 0		12M	   4.3k	 0	   232K	 0
	  11k	 0	 0		16M	   5.4k	 0	   289K	 0

which seems far more reasonable. Curious thing though: I get this while reading from the server which is not performing right. I think I'm facing issues with hardware here (call it the cable, switch port, net card...) I'll keep you informed as soon as I get over it. Any advice is more than welcomed since I'm kind of seasoned in linux rather than FreeBSD.
Thanks again to everyone.

Thomas102 · Jul 26, 2017

Ping loss is not normal. Looks like your cornered your issue :)
You may have a lot more ping loss if you test ping and benchmark at the same time.

iperf is easy to set up and much more realistic than ping. Just remeber the client send to the server. So to test for both direction you must swap the roles. Also, run in udp mode for packet loss measurement.

I wasn't clear with the zpool iostat command. What I do is run the iozone or file copy directly on FreeNAS. This way it removes network and cache behavior when trying to determine raw performances issues.
In my case I suffered from both networking and raid card issues. It helped me to corner and resolve the disk issue.

Matute · Jul 26, 2017

Thanks @Thomas102! I finally got it, it was a switch port having an unstable behaviour, I just changed the cable to another port and everything started working fine (as it used to be) again. A new Cisco is on the way even if the old one can be reconfigured, I'm leaving on holidays in 4 days and am not willing to leave a non stable switch behind.
BTW right now I get with netstat -h -n -I bge0 1

Code:

			input		   bge0		   output
  packets  errs idrops	  bytes	packets  errs	  bytes colls
	  13k	 0	 0		19M	   6.5k	 0	   349K	 0
	  15k	 0	 0		22M	   7.6k	 0	   407K	 0
	  6.1k	 0	 0	   8.5M	   3.2k	 0	   252K	 0
	  11k	 0	 0		16M	   5.8k	 0	   456K	 0
	  19k	 0	 0		27M	   9.4k	 0	   499K	 0
	  18k	 0	 0		26M	   9.0k	 0	   477K	 0

I used iperf, great tool. Thanks again, issue resolved!

Important Announcement for the TrueNAS Community.

Write Speed issues

Matute

Dabbler

Thomas102

Explorer

m0nkey_

MVP

bigphil

Patron

Matute

Dabbler

Thomas102

Explorer

Matute

Dabbler

Thomas102

Explorer

Matute

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Write Speed issues

Dabbler

Explorer

MVP

Patron

Dabbler

Explorer

Dabbler

Explorer

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Write Speed issues"

Similar threads