NVMe drives upgrades to add mirrored SLOG and a L2ARC

Status
Not open for further replies.

Stux

MVP
Joined
Jun 2, 2016
Messages
4,358
And you do have the option of using the pair of 900ps for both L2ARC and SLOG. Striped for the L2ARC and Mirrored for the SLOG.

This will have the potential to theoretically impact your SLOG performance though, but it will remove that "i'm wasting space" feeling ;)

If it does actually impact performance will depend on your usage mix between sync writes vs l2arc reads. Perhaps worth testing... as testing is actually safe and relatively easy once you have the hardware.

As you might know, not all PCIe slots are necessarily active if not all CPU sockets are utilized.
 

Ender117

Patron
Joined
Aug 20, 2018
Messages
219
Edit: I am losing my freaking mind! Sorry, it has been a rough week or so. ~= 3 all nighters in 10 days.

My "X" may be off, but the transfer difference was 500-700Mb up to 4Gb. Yes, "M" instead of "K". Argh!
OK, I am surprised that your 900p only brings you to 4Gb/s. My P3700 (supposedly a bit slower than 900p) can saturate 10Gb no problem with sequence writes. Also surprised that you can get 500-700Mb w/o SLOG. Mine was like 260Mb if I offline the SLOG (3 mirror vdev pool made of 6 7.2k SAS drives). Looked at your signature and our spec are not that different.

PS: bless you!
 
Joined
Dec 29, 2014
Messages
1,135
OK, I am surprised that your 900p only brings you to 4Gb/s.

I suspect the problem is the SATA HDD's in the pool (aka spinning rust). I suspect ZFS will only cache a certain amount in the SLOG before it is waiting on the drives in the pool. I'd like a higher rate, but it doesn't matter enough to me to spend a whole bunch more time on it. I am sure it would smoke if my whole pool were SSD's, but that isn't what I have.
 

Ender117

Patron
Joined
Aug 20, 2018
Messages
219
I suspect the problem is the SATA HDD's in the pool (aka spinning rust). I suspect ZFS will only cache a certain amount in the SLOG before it is waiting on the drives in the pool. I'd like a higher rate, but it doesn't matter enough to me to spend a whole bunch more time on it. I am sure it would smoke if my whole pool were SSD's, but that isn't what I have.
Could also be fragmentation. Otherwise your 2 6 drive vdev pool should be able to do 8Gb/s throughput no problem
 
Joined
Dec 29, 2014
Messages
1,135
Could also be fragmentation. Otherwise your 2 6 drive vdev pool should be able to do 8Gb/s throughput no problem

Now you have me curious. Here are what all my "dirty" relate sysctl values.
Code:
root@freenas2:/nonexistent # sysctl -a | grep dirty
vfs.nfs.nfs_keep_dirty_on_error: 0
vfs.zfs.vdev.async_write_active_max_dirty_percent: 60
vfs.zfs.vdev.async_write_active_min_dirty_percent: 30
vfs.zfs.delay_min_dirty_percent: 60
vfs.zfs.dirty_data_sync: 67108864
vfs.zfs.dirty_data_max_percent: 10
vfs.zfs.dirty_data_max_max: 4294967296
vfs.zfs.dirty_data_max: 4294967296
vfs.zfs.per_txg_dirty_frees_percent: 30
vfs.dirtybufthresh: 23851
vfs.hidirtybuffers: 26502
vfs.lodirtybuffers: 13251
vfs.numdirtybuffers: 0
vfs.dirtybufferflushes: 0


Here is my zpool info
Code:
root@freenas2:/nonexistent # zpool status -v RAIDZ2-I
  pool: RAIDZ2-I
 state: ONLINE
  scan: scrub repaired 0 in 0 days 02:47:24 with 0 errors on Wed Oct  3 23:54:43 2018
config:

		NAME											STATE	 READ WRITE CKSUM
		RAIDZ2-I										ONLINE	   0	 0	 0
		  raidz2-0									  ONLINE	   0	 0	 0
			gptid/bd041ac6-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
			gptid/bdef2899-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
			gptid/bed51d90-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
			gptid/bfb76075-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
			gptid/c09c704a-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
			gptid/c1922b7c-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
			gptid/c276eb75-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
			gptid/c3724eeb-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
		  raidz2-1									  ONLINE	   0	 0	 0
			gptid/a1b7ef4b-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
			gptid/a2eb419f-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
			gptid/a41758d7-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
			gptid/a5444dfb-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
			gptid/a6dcd16f-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
			gptid/a80cd73c-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
			gptid/a94711a5-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
			gptid/aaa6631d-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
		logs
		  nvd0p1										ONLINE	   0	 0	 0
		cache
		  nvd0p4										ONLINE	   0	 0	 0
		spares
		  gptid/4abff125-23a2-11e8-a466-e4c722848f30	AVAIL

errors: No known data errors


Here is how the Optane 900P is partitioned

Code:
root@freenas2:/nonexistent # gpart show nvd0
=>	   40  547002208  nvd0  GPT  (261G)
		 40	   2008		- free -  (1.0M)
	   2048   33554432	 1  freebsd-zfs  (16G)
   33556480   33554432	 2  freebsd-zfs  (16G)
   67110912   33554432	 3  freebsd-zfs  (16G)
  100665344  446334976	 4  freebsd-zfs  (213G)
  547000320	   1928		- free -  (964K)


All the HDD's in this pool are Seagate ST91000640NS (1TB, 7.2K SATA 2.5"). I'll admit my testing of write throughput wasn't the most scientific. I was looking at what I could get doing a Vmotion. This is on my primary NAS (all specs in sig).
 

Ender117

Patron
Joined
Aug 20, 2018
Messages
219
Now you have me curious. Here are what all my "dirty" relate sysctl values.
Code:
root@freenas2:/nonexistent # sysctl -a | grep dirty
vfs.nfs.nfs_keep_dirty_on_error: 0
vfs.zfs.vdev.async_write_active_max_dirty_percent: 60
vfs.zfs.vdev.async_write_active_min_dirty_percent: 30
vfs.zfs.delay_min_dirty_percent: 60
vfs.zfs.dirty_data_sync: 67108864
vfs.zfs.dirty_data_max_percent: 10
vfs.zfs.dirty_data_max_max: 4294967296
vfs.zfs.dirty_data_max: 4294967296
vfs.zfs.per_txg_dirty_frees_percent: 30
vfs.dirtybufthresh: 23851
vfs.hidirtybuffers: 26502
vfs.lodirtybuffers: 13251
vfs.numdirtybuffers: 0
vfs.dirtybufferflushes: 0


Here is my zpool info
Code:
root@freenas2:/nonexistent # zpool status -v RAIDZ2-I
  pool: RAIDZ2-I
 state: ONLINE
  scan: scrub repaired 0 in 0 days 02:47:24 with 0 errors on Wed Oct  3 23:54:43 2018
config:

		NAME											STATE	 READ WRITE CKSUM
		RAIDZ2-I										ONLINE	   0	 0	 0
		  raidz2-0									  ONLINE	   0	 0	 0
			gptid/bd041ac6-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
			gptid/bdef2899-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
			gptid/bed51d90-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
			gptid/bfb76075-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
			gptid/c09c704a-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
			gptid/c1922b7c-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
			gptid/c276eb75-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
			gptid/c3724eeb-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
		  raidz2-1									  ONLINE	   0	 0	 0
			gptid/a1b7ef4b-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
			gptid/a2eb419f-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
			gptid/a41758d7-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
			gptid/a5444dfb-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
			gptid/a6dcd16f-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
			gptid/a80cd73c-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
			gptid/a94711a5-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
			gptid/aaa6631d-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
		logs
		  nvd0p1										ONLINE	   0	 0	 0
		cache
		  nvd0p4										ONLINE	   0	 0	 0
		spares
		  gptid/4abff125-23a2-11e8-a466-e4c722848f30	AVAIL

errors: No known data errors


Here is how the Optane 900P is partitioned

Code:
root@freenas2:/nonexistent # gpart show nvd0
=>	   40  547002208  nvd0  GPT  (261G)
		 40	   2008		- free -  (1.0M)
	   2048   33554432	 1  freebsd-zfs  (16G)
   33556480   33554432	 2  freebsd-zfs  (16G)
   67110912   33554432	 3  freebsd-zfs  (16G)
  100665344  446334976	 4  freebsd-zfs  (213G)
  547000320	   1928		- free -  (964K)


All the HDD's in this pool are Seagate ST91000640NS (1TB, 7.2K SATA 2.5"). I'll admit my testing of write throughput wasn't the most scientific. I was looking at what I could get doing a Vmotion. This is on my primary NAS (all specs in sig).

Well, you can use zpool list -v to see the current fragmentation. ZFS is known to drop in write speed as pool get filled. To further confirm the theory, run iozone -a -r 1M -s 128G on the pool both with sync on and off. If it's HDD cannot keep up they should be close. Remember to turn off compression because iozone test data is compressible.
 
Joined
Dec 29, 2014
Messages
1,135
Zpool list
Code:
root@freenas2:/nonexistent # zpool list -v RAIDZ2-I
NAME									 SIZE  ALLOC   FREE  EXPANDSZ   FRAG	CAP  DEDUP  HEALTH  ALTROOT
RAIDZ2-I								14.5T  5.75T  8.75T		 -	32%	39%  1.00x  ONLINE  /mnt
  raidz2								7.25T  5.52T  1.73T		 -	60%	76%
	gptid/bd041ac6-9e63-11e7-a091-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/bdef2899-9e63-11e7-a091-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/bed51d90-9e63-11e7-a091-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/bfb76075-9e63-11e7-a091-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/c09c704a-9e63-11e7-a091-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/c1922b7c-9e63-11e7-a091-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/c276eb75-9e63-11e7-a091-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/c3724eeb-9e63-11e7-a091-e4c722848f30	  -	  -	  -		 -	  -	  -
  raidz2								7.25T   240G  7.02T		 -	 5%	 3%
	gptid/a1b7ef4b-3c2a-11e8-978a-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/a2eb419f-3c2a-11e8-978a-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/a41758d7-3c2a-11e8-978a-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/a5444dfb-3c2a-11e8-978a-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/a6dcd16f-3c2a-11e8-978a-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/a80cd73c-3c2a-11e8-978a-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/a94711a5-3c2a-11e8-978a-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/aaa6631d-3c2a-11e8-978a-e4c722848f30	  -	  -	  -		 -	  -	  -
log										 -	  -	  -		 -	  -	  -
  nvd0p1								15.9G  5.65M  15.9G		 -	 0%	 0%
cache									   -	  -	  -		 -	  -	  -
  nvd0p4								 213G  17.3G   196G		 -	 0%	 8%
spare									   -	  -	  -		 -	  -	  -
  gptid/4abff125-23a2-11e8-a466-e4c722848f30	  -	  -	  -		 -	  -	  -


There does seem to be quite a lot of fragmentation on the first vdev. Here are the iozone results.

Code:
+ zpool remove RAIDZ2-I nvd0p1
+ echo 'No SLOG'
No SLOG
+ iozone -a -r 1M -s 128G ioz.foo
		Iozone: Performance Test of File I/O
				Version $Revision: 3.457 $
				Compiled for 64 bit mode.
				Build: freebsd

		Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
					 Al Slater, Scott Rhine, Mike Wisner, Ken Goss
					 Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
					 Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
					 Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
					 Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
					 Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
					 Vangel Bojaxhi, Ben England, Vikentsi Lapa,
					 Alexey Skidanov.

		Run began: Wed Oct 31 16:48:24 2018

		Auto Mode
		Record Size 1024 kB
		File size set to 134217728 kB
		Command line used: iozone -a -r 1M -s 128G ioz.foo
		Output is in kBytes/sec
		Time Resolution = 0.000001 seconds.
		Processor cache size set to 1024 kBytes.
		Processor cache line size set to 32 bytes.
		File stride size set to 17 * record size.
															  random	random	 bkwd	record	stride
			  kB  reclen	write  rewrite	read	reread	read	 write	 read   rewrite	  read   fwrite frewrite	fread  freread
	   134217728	1024   400957   366973   592830  1039347   211512   334612   281699  11116220	214646   336425   338694   665110  1432903

iozone test complete.
+ zpool add RAIDZ2-I log nvd0p1
+ echo 'With SLOG'
With SLOG
+ iozone -a -r 1M -s 128G ioz.foo
		Iozone: Performance Test of File I/O
				Version $Revision: 3.457 $
				Compiled for 64 bit mode.
				Build: freebsd

		Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
					 Al Slater, Scott Rhine, Mike Wisner, Ken Goss
					 Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
					 Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
					 Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
					 Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
					 Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
					 Vangel Bojaxhi, Ben England, Vikentsi Lapa,
					 Alexey Skidanov.

		Run began: Wed Oct 31 18:30:59 2018

		Auto Mode
		Record Size 1024 kB
		File size set to 134217728 kB
		Command line used: iozone -a -r 1M -s 128G ioz.foo
		Output is in kBytes/sec
		Time Resolution = 0.000001 seconds.
		Processor cache size set to 1024 kBytes.
		Processor cache line size set to 32 bytes.
		File stride size set to 17 * record size.
															  random	random	 bkwd	record	stride
			  kB  reclen	write  rewrite	read	reread	read	 write	 read   rewrite	  read   fwrite frewrite	fread  freread
	   134217728	1024   406358   403717   674476  1171143   284423   362094   203309  11029127	235751   408443   402536   638808  1154220

iozone test complete.
 

Ender117

Patron
Joined
Aug 20, 2018
Messages
219
Zpool list
Code:
root@freenas2:/nonexistent # zpool list -v RAIDZ2-I
NAME									 SIZE  ALLOC   FREE  EXPANDSZ   FRAG	CAP  DEDUP  HEALTH  ALTROOT
RAIDZ2-I								14.5T  5.75T  8.75T		 -	32%	39%  1.00x  ONLINE  /mnt
  raidz2								7.25T  5.52T  1.73T		 -	60%	76%
	gptid/bd041ac6-9e63-11e7-a091-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/bdef2899-9e63-11e7-a091-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/bed51d90-9e63-11e7-a091-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/bfb76075-9e63-11e7-a091-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/c09c704a-9e63-11e7-a091-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/c1922b7c-9e63-11e7-a091-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/c276eb75-9e63-11e7-a091-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/c3724eeb-9e63-11e7-a091-e4c722848f30	  -	  -	  -		 -	  -	  -
  raidz2								7.25T   240G  7.02T		 -	 5%	 3%
	gptid/a1b7ef4b-3c2a-11e8-978a-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/a2eb419f-3c2a-11e8-978a-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/a41758d7-3c2a-11e8-978a-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/a5444dfb-3c2a-11e8-978a-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/a6dcd16f-3c2a-11e8-978a-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/a80cd73c-3c2a-11e8-978a-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/a94711a5-3c2a-11e8-978a-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/aaa6631d-3c2a-11e8-978a-e4c722848f30	  -	  -	  -		 -	  -	  -
log										 -	  -	  -		 -	  -	  -
  nvd0p1								15.9G  5.65M  15.9G		 -	 0%	 0%
cache									   -	  -	  -		 -	  -	  -
  nvd0p4								 213G  17.3G   196G		 -	 0%	 8%
spare									   -	  -	  -		 -	  -	  -
  gptid/4abff125-23a2-11e8-a466-e4c722848f30	  -	  -	  -		 -	  -	  -


There does seem to be quite a lot of fragmentation on the first vdev. Here are the iozone results.

Code:
+ zpool remove RAIDZ2-I nvd0p1
+ echo 'No SLOG'
No SLOG
+ iozone -a -r 1M -s 128G ioz.foo
		Iozone: Performance Test of File I/O
				Version $Revision: 3.457 $
				Compiled for 64 bit mode.
				Build: freebsd

		Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
					 Al Slater, Scott Rhine, Mike Wisner, Ken Goss
					 Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
					 Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
					 Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
					 Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
					 Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
					 Vangel Bojaxhi, Ben England, Vikentsi Lapa,
					 Alexey Skidanov.

		Run began: Wed Oct 31 16:48:24 2018

		Auto Mode
		Record Size 1024 kB
		File size set to 134217728 kB
		Command line used: iozone -a -r 1M -s 128G ioz.foo
		Output is in kBytes/sec
		Time Resolution = 0.000001 seconds.
		Processor cache size set to 1024 kBytes.
		Processor cache line size set to 32 bytes.
		File stride size set to 17 * record size.
															  random	random	 bkwd	record	stride
			  kB  reclen	write  rewrite	read	reread	read	 write	 read   rewrite	  read   fwrite frewrite	fread  freread
	   134217728	1024   400957   366973   592830  1039347   211512   334612   281699  11116220	214646   336425   338694   665110  1432903

iozone test complete.
+ zpool add RAIDZ2-I log nvd0p1
+ echo 'With SLOG'
With SLOG
+ iozone -a -r 1M -s 128G ioz.foo
		Iozone: Performance Test of File I/O
				Version $Revision: 3.457 $
				Compiled for 64 bit mode.
				Build: freebsd

		Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
					 Al Slater, Scott Rhine, Mike Wisner, Ken Goss
					 Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
					 Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
					 Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
					 Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
					 Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
					 Vangel Bojaxhi, Ben England, Vikentsi Lapa,
					 Alexey Skidanov.

		Run began: Wed Oct 31 18:30:59 2018

		Auto Mode
		Record Size 1024 kB
		File size set to 134217728 kB
		Command line used: iozone -a -r 1M -s 128G ioz.foo
		Output is in kBytes/sec
		Time Resolution = 0.000001 seconds.
		Processor cache size set to 1024 kBytes.
		Processor cache line size set to 32 bytes.
		File stride size set to 17 * record size.
															  random	random	 bkwd	record	stride
			  kB  reclen	write  rewrite	read	reread	read	 write	 read   rewrite	  read   fwrite frewrite	fread  freread
	   134217728	1024   406358   403717   674476  1171143   284423   362094   203309  11029127	235751   408443   402536   638808  1154220

iozone test complete.
Sorry I meant to say test with sync on and off, not with and w/o SLOG. I typed it wrong at first and edited it soon after. You may still see that in email notification?

With that being said, if your result are of with and without SLOG are almost the same, you most likely are not turning sync=always on and are not hitting SLOG in either test. If you set sync=always with SLOG, the result is still close to 400MB/s, then your bottleneck can be confirmed to be HDD part not 900p.

For my pool once test size is way above dirty data max, it's about 450MB/s, so ~150MB/s for each vdev. Typical number for a mirror vdev. Yet for x drives RaidZn, max throughput should be (x-n)*throughput of single disk. You are seeing only 400MB most likely because fragmentation on the first vdev.


PS: if you are interested, would you also share result of iozone -a -s 512M -O on a 900p backed, sync=always pool? this should show how it performs as an SLOG. I got numbers that are only fractions of diskinfo -wS result and am not sure why: https://forums.freenas.org/index.php?threads/performance-of-sync-writes.70470/
 
Joined
Dec 29, 2014
Messages
1,135
I meant to say test with sync on and off, not with and w/o SLOG. I typed it wrong at first and edited it soon after. You may still see that in email notification?

With that being said, if your result are of with and without SLOG are almost the same, you most likely are not turning sync=always on and are not hitting SLOG in either test.

I saw that, but then I got hung up on the +-SLOG thing. In any case, I don't think there is any solution to the fragmentation other than deleting and recreating the pool. I started off with the single vdev, and then add the second one later. Not sure if that somehow jacked things up. In any event, I am running my production VM's from the local datastore on one of the VM hosts and copying the data from the primary to secondary box now. Once that is done, I will delete and recreate the pool. Rather tedious, but I do now actually have the storage space to do that.
 

Ender117

Patron
Joined
Aug 20, 2018
Messages
219
I saw that, but then I got hung up on the +-SLOG thing. In any case, I don't think there is any solution to the fragmentation other than deleting and recreating the pool. I started off with the single vdev, and then add the second one later. Not sure if that somehow jacked things up. In any event, I am running my production VM's from the local datastore on one of the VM hosts and copying the data from the primary to secondary box now. Once that is done, I will delete and recreate the pool. Rather tedious, but I do now actually have the storage space to do that.
Nice. Once it's done, be sure let us know the true potential of the 900p;)
 
Joined
Dec 29, 2014
Messages
1,135
I am in the process of moving data back to the recreated pool, but I am really confused about how the vdevs look. To make sure I got only the devices I intended, I disconnected all of them prior to building the pool. I first plugged 8 devices and created the pool as RAIDZ2 from the GUI. I added 8 more devices extending the pool. Then I plugged in the last device and added it as a spare, and did the CLI config to add the SLOG and L2ARC. What is confusing me is that I have 4 vdevs instead of the 2 I was expecting.
Code:
root@freenas2:/nonexistent # zpool list -v RAIDZ2-I
NAME									 SIZE  ALLOC   FREE  EXPANDSZ   FRAG	CAP  DEDUP  HEALTH  ALTROOT
RAIDZ2-I								14.5T   158G  14.3T		 -	 0%	 1%  1.00x  ONLINE  /mnt
  raidz2								3.62T  39.6G  3.59T		 -	 0%	 1%
	gptid/70852685-dddc-11e8-adca-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/71686954-dddc-11e8-adca-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/724e4021-dddc-11e8-adca-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/73554422-dddc-11e8-adca-e4c722848f30	  -	  -	  -		 -	  -	  -
  raidz2								3.62T  39.6G  3.59T		 -	 0%	 1%
	gptid/746dafd4-dddc-11e8-adca-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/75626368-dddc-11e8-adca-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/764f85ad-dddc-11e8-adca-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/779e221d-dddc-11e8-adca-e4c722848f30	  -	  -	  -		 -	  -	  -
  raidz2								3.62T  39.6G  3.59T		 -	 0%	 1%
	gptid/ce00e493-dddc-11e8-adca-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/ceee50a2-dddc-11e8-adca-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/cfe1d0e6-dddc-11e8-adca-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/d0d66ee5-dddc-11e8-adca-e4c722848f30	  -	  -	  -		 -	  -	  -
  raidz2								3.62T  39.6G  3.59T		 -	 0%	 1%
	gptid/d450f2fb-dddc-11e8-adca-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/d55539a0-dddc-11e8-adca-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/d64a2170-dddc-11e8-adca-e4c722848f30	  -	  -	  -		 -	  -	  -
	gptid/d7709adf-dddc-11e8-adca-e4c722848f30	  -	  -	  -		 -	  -	  -
log										 -	  -	  -		 -	  -	  -
  nvd0p1								15.9G	  0  15.9G		 -	 0%	 0%
cache									   -	  -	  -		 -	  -	  -
  nvd0p4								 213G  17.5G   195G		 -	 0%	 8%
spare									   -	  -	  -		 -	  -	  -
  gptid/f3fa63e8-dddc-11e8-adca-e4c722848f30	  -	  -	  -		 -	  -	  -


It looks to be giving me about the amount of space I was expecting (roughly the amount of 12 drives out of the 16), but it isn't layed out at all like it was before. Is this expected, and why is it different?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
Not to backseat-moderate or anything; I like the open discussion and all, because everyone can learn, but maybe this needs another thread so we can not derail the OP's.

@Elliot Dierksen tag me in on the new thread, I've got ideas so you can poke at your performance tunables and track the txg times with some dtrace scripts - but I don't want to (further) drag this one on a tangent. ;)
 
Status
Not open for further replies.
Top