Resource icon

SLOG benchmarking and finding the best SLOG

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
Awesome to see the little Optane Memory devices still able to swing some serious numbers at small recordsizes. For a homelab scenario this would be an awesome SLOG, able to handle a pair of 1Gbps links in MPIO round-robin at full speed (although it might wear the poor thing out awful quick!)

Couple it with something like the upcoming M.2-to-5.25" NVMe tray from IcyDock and you could even have it be hot-swappable in case of failure.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
NVME slot.
There is no such thing. NVMe is a protocol on top of PCIe and as such works over the entire myriad of PCIe interfaces. Standard expansion slots, M.2, thunderbolt, ExpressCard, U.2, ...

The M.2 socket is a series of standards for all sorts of things, the most common being storage with 4x PCIe, 1x SATA, USB and SMBus/I2C. And that one connector is just the tip of the iceberg.
 

Ender117

Patron
Joined
Aug 20, 2018
Messages
219
OK, I am here to share some results, as promised:)
Code:
root@freenas:/dev # diskinfo -wS /dev/nvd0
/dev/nvd0
		512			 # sectorsize
		400088457216	# mediasize in bytes (373G)
		781422768	   # mediasize in sectors
		131072		  # stripesize
		0			   # stripeoffset
		INTEL SSDPEDMD400G4	 # Disk descr.
		CVFT7112001A400LGN	  # Disk ident.

Synchronous random writes:
		 0.5 kbytes:	 23.1 usec/IO =	 21.1 Mbytes/s
		   1 kbytes:	 23.0 usec/IO =	 42.5 Mbytes/s
		   2 kbytes:	 23.4 usec/IO =	 83.6 Mbytes/s
		   4 kbytes:	 21.3 usec/IO =	183.5 Mbytes/s
		   8 kbytes:	 23.7 usec/IO =	329.8 Mbytes/s
		  16 kbytes:	 30.5 usec/IO =	512.9 Mbytes/s
		  32 kbytes:	 38.3 usec/IO =	816.7 Mbytes/s
		  64 kbytes:	 72.8 usec/IO =	859.0 Mbytes/s
		 128 kbytes:	144.1 usec/IO =	867.6 Mbytes/s
		 256 kbytes:	240.5 usec/IO =   1039.6 Mbytes/s
		 512 kbytes:	469.6 usec/IO =   1064.8 Mbytes/s
		1024 kbytes:	945.7 usec/IO =   1057.4 Mbytes/s
		2048 kbytes:   1867.8 usec/IO =   1070.8 Mbytes/s
		4096 kbytes:   3719.5 usec/IO =   1075.4 Mbytes/s
		8192 kbytes:   7510.7 usec/IO =   1065.1 Mbytes/s

root@freenas:/dev # smartctl -a /dev/nvme0
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:					   INTEL SSDPEDMD400G4
Serial Number:					  CVFT7112001A400LGN
Firmware Version:				   8DV101H0
PCI Vendor/Subsystem ID:			0x8086
IEEE OUI Identifier:				0x5cd2e4
Controller ID:					  0
Number of Namespaces:			   1
Namespace 1 Size/Capacity:		  400,088,457,216 [400 GB]
Namespace 1 Formatted LBA Size:	 512
Local Time is:					  Fri Sep 28 11:14:53 2018 PDT
Firmware Updates (0x02):			1 Slot
Optional Admin Commands (0x0006):   Format Frmw_DL
Optional NVM Commands (0x0006):	 Wr_Unc DS_Mngmt
Maximum Data Transfer Size:		 32 Pages

Supported Power States
St Op	 Max   Active	 Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +	25.00W	   -		-	0  0  0  0		0	   0

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 -	 512	   0		 2
 1 -	 512	   8		 2
 2 -	 512	  16		 2
 3 -	4096	   0		 0
 4 -	4096	   8		 0
 5 -	4096	  64		 0
 6 -	4096	 128		 0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning:				   0x00
Temperature:						26 Celsius
Available Spare:					100%
Available Spare Threshold:		  10%
Percentage Used:					0%
Data Units Read:					37,308,773 [19.1 TB]
Data Units Written:				 49,439,367 [25.3 TB]
Host Read Commands:				 470,446,516
Host Write Commands:				752,312,561
Controller Busy Time:			   0
Power Cycles:					   108
Power On Hours:					 6,463
Unsafe Shutdowns:				   11
Media and Data Integrity Errors:	0
Error Information Log Entries:	  0

Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged


Looks like mine is slower than others have seem:(, don't know why.


And on idea abusing the 1GB onboard cache of H710p, this is on a 2 10k sas HDD raid1, write back, adaptive read ahead, stripe size 64K:

Code:
root@freenas:/dev # smartctl -a /dev/da1
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:			   DELL
Product:			  PERC H710P
Revision:			 3.13
User Capacity:		299,439,751,168 bytes [299 GB]
Logical block size:   512 bytes
>> Terminate command early due to bad response to IEC mode page
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
root@freenas:/dev # diskinfo -wS /dev/da1
/dev/da1
		512			 # sectorsize
		299439751168	# mediasize in bytes (279G)
		584843264	   # mediasize in sectors
		0			   # stripesize
		0			   # stripeoffset
		36404		   # Cylinders according to firmware.
		255			 # Heads according to firmware.
		63			  # Sectors according to firmware.
		DELL PERC H710P # Disk descr.
		00e3687f0a01dfe522004564e760f681		# Disk ident.
		Not_Zoned	   # Zone Mode

Synchronous random writes:
		 0.5 kbytes:	322.9 usec/IO =	  1.5 Mbytes/s
		   1 kbytes:   2367.3 usec/IO =	  0.4 Mbytes/s
		   2 kbytes:	929.6 usec/IO =	  2.1 Mbytes/s
		   4 kbytes:	891.1 usec/IO =	  4.4 Mbytes/s
		   8 kbytes:	902.2 usec/IO =	  8.7 Mbytes/s
		  16 kbytes:	863.9 usec/IO =	 18.1 Mbytes/s
		  32 kbytes:	903.2 usec/IO =	 34.6 Mbytes/s
		  64 kbytes:	896.4 usec/IO =	 69.7 Mbytes/s
		 128 kbytes:   2266.5 usec/IO =	 55.2 Mbytes/s
		 256 kbytes:   4645.2 usec/IO =	 53.8 Mbytes/s
		 512 kbytes:   8638.2 usec/IO =	 57.9 Mbytes/s
		1024 kbytes:  20928.2 usec/IO =	 47.8 Mbytes/s
		2048 kbytes:  32966.4 usec/IO =	 60.7 Mbytes/s
		4096 kbytes:  46831.1 usec/IO =	 85.4 Mbytes/s
		8192 kbytes:  86138.7 usec/IO =	 92.9 Mbytes/s



You are better off just buy a cheapo S3500 lol


PS1: How do I overprovision the p3700? I used Hdparm with sata ssd before but this doesn't seems to work with nvme.

PS2:https://forums.freenas.org/index.php?threads/troubleshooting-low-disk-write-speed.70217/ Anyone got an idea? Though turns out to be FreeBSD related but I am sure someone here have used LSI HBA+SAS HDDs in FreeNAS boxes.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,358

Ender117

Patron
Joined
Aug 20, 2018
Messages
219
@Ender117 - Also make sure you change the drive to use 4K sectors, which @Stux has helpfully also included in his guide as well.
Will look into that. I also found my numbers to be less performant than others with exact the same model #, maybe that's the reason?

Update, reformatted into 4K and HPAed to 40G, not seeing any meaningful changes (actually a bit slower), oh well maybe just bad luck in silicon lottery
Code:
root@freenas:~ # diskinfo -wS /dev/nvd0
/dev/nvd0
		4096			# sectorsize
		42999996416	 # mediasize in bytes (40G)
		10498046		# mediasize in sectors
		131072		  # stripesize
		0			   # stripeoffset
		INTEL SSDPEDMD400G4	 # Disk descr.
		CVFT7112001A400LGN	  # Disk ident.

Synchronous random writes:
		   4 kbytes:	 22.1 usec/IO =	176.7 Mbytes/s
		   8 kbytes:	 24.2 usec/IO =	322.4 Mbytes/s
		  16 kbytes:	 29.3 usec/IO =	533.3 Mbytes/s
		  32 kbytes:	 39.4 usec/IO =	793.8 Mbytes/s
		  64 kbytes:	 72.4 usec/IO =	862.7 Mbytes/s
		 128 kbytes:	147.8 usec/IO =	845.8 Mbytes/s
		 256 kbytes:	250.1 usec/IO =	999.5 Mbytes/s
		 512 kbytes:	490.6 usec/IO =   1019.2 Mbytes/s
		1024 kbytes:	961.4 usec/IO =   1040.1 Mbytes/s
		2048 kbytes:   1926.7 usec/IO =   1038.1 Mbytes/s
		4096 kbytes:   3806.6 usec/IO =   1050.8 Mbytes/s
		8192 kbytes:   7432.6 usec/IO =   1076.3 Mbytes/s

 
Last edited:

Sirius

Dabbler
Joined
Mar 1, 2018
Messages
41
Intel Optane 800p 118gb
Code:
/dev/nvd0
		512			 # sectorsize
		118410444800	# mediasize in bytes (110G)
		231270400	   # mediasize in sectors
		0			   # stripesize
		0			   # stripeoffset
		INTEL SSDPEK1W120GA	 # Disk descr.
		PHBT80450161128R		# Disk ident.

Synchronous random writes:
		 0.5 kbytes:	 83.8 usec/IO =	  5.8 Mbytes/s
		   1 kbytes:	 81.2 usec/IO =	 12.0 Mbytes/s
		   2 kbytes:	 84.4 usec/IO =	 23.1 Mbytes/s
		   4 kbytes:	 87.2 usec/IO =	 44.8 Mbytes/s
		   8 kbytes:	 98.9 usec/IO =	 79.0 Mbytes/s
		  16 kbytes:	118.2 usec/IO =	132.2 Mbytes/s
		  32 kbytes:	157.4 usec/IO =	198.5 Mbytes/s
		  64 kbytes:	221.1 usec/IO =	282.7 Mbytes/s
		 128 kbytes:	368.9 usec/IO =	338.9 Mbytes/s
		 256 kbytes:	590.0 usec/IO =	423.7 Mbytes/s
		 512 kbytes:   1040.3 usec/IO =	480.6 Mbytes/s
		1024 kbytes:   1945.8 usec/IO =	513.9 Mbytes/s
		2048 kbytes:   3736.8 usec/IO =	535.2 Mbytes/s
		4096 kbytes:   7325.6 usec/IO =	546.0 Mbytes/s
		8192 kbytes:  14913.1 usec/IO =	536.4 Mbytes/s


Intel Optane 800p 58gb
Code:
/dev/nvd1
		512			 # sectorsize
		58977157120	 # mediasize in bytes (55G)
		115189760	   # mediasize in sectors
		0			   # stripesize
		0			   # stripeoffset
		INTEL SSDPEK1W060GA	 # Disk descr.
		PHBT803301ML064Q		# Disk ident.

Synchronous random writes:
		 0.5 kbytes:	 89.4 usec/IO =	  5.5 Mbytes/s
		   1 kbytes:	 91.6 usec/IO =	 10.7 Mbytes/s
		   2 kbytes:	 90.6 usec/IO =	 21.6 Mbytes/s
		   4 kbytes:	 94.6 usec/IO =	 41.3 Mbytes/s
		   8 kbytes:	 98.8 usec/IO =	 79.1 Mbytes/s
		  16 kbytes:	121.7 usec/IO =	128.3 Mbytes/s
		  32 kbytes:	161.8 usec/IO =	193.1 Mbytes/s
		  64 kbytes:	230.3 usec/IO =	271.4 Mbytes/s
		 128 kbytes:	365.8 usec/IO =	341.7 Mbytes/s
		 256 kbytes:	586.8 usec/IO =	426.0 Mbytes/s
		 512 kbytes:   1059.6 usec/IO =	471.9 Mbytes/s
		1024 kbytes:   1968.8 usec/IO =	507.9 Mbytes/s
		2048 kbytes:   3810.8 usec/IO =	524.8 Mbytes/s
		4096 kbytes:   7420.3 usec/IO =	539.1 Mbytes/s
		8192 kbytes:  14648.0 usec/IO =	546.2 Mbytes/s


This is a dual socket system with the Optanes passed through to FreeNAS. It's possible I didn't pin the VM to the right cores yet so some of the latency is probably from going over the QPI... I think.

Also, I got this error when using smartctl on the Optanes...
Code:
smartctl -a /dev/nvd0
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

/dev/nvd0: Unable to detect device type
Please specify device type with the -d option.

Use smartctl -h to get a usage summary
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
This is a dual socket system with the Optanes passed through to FreeNAS. It's possible I didn't pin the VM to the right cores yet so some of the latency is probably from going over the QPI... I think.

Yeah, something's amiss - those drives are getting spanked by the teeny little 32GB Optane Memory from last page; up until recordsize exceeds 128K. If it's having to hop across a socket link that'll kill it in a hurry.

Also, I got this error when using smartctl on the Optanes...

Try smartctl -a /dev/nvme0
 

Sirius

Dabbler
Joined
Mar 1, 2018
Messages
41
Yeah, something's amiss - those drives are getting spanked by the teeny little 32GB Optane Memory from last page; up until recordsize exceeds 128K. If it's having to hop across a socket link that'll kill it in a hurry.



Try smartctl -a /dev/nvme0

Ah thanks for that tip! And yeah if it's getting smashed by the Optane memory then something isn't right. I'll see what I can do to optimise things further - probably just pinning the VM to the CPU with the Optanes.
 
Joined
Dec 29, 2014
Messages
1,135
Check this out. I changed out the CPU (E5-2660 v2 -> E5-2637 v2) in my primary FreeNAS, and now I am getting a little bit higher results on the bigger block size tests.

Code:
root@freenas2:/nonexistent # diskinfo -wS /dev/nvd0
/dev/nvd0
		512			 # sectorsize
		280065171456	# mediasize in bytes (261G)
		547002288	   # mediasize in sectors
		0			   # stripesize
		0			   # stripeoffset
		INTEL SSDPED1D280GA	 # Disk descr.
		PHMB742401A6280CGN	  # Disk ident.

Synchronous random writes:
		 0.5 kbytes:	 18.0 usec/IO =	 27.1 Mbytes/s
		   1 kbytes:	 18.6 usec/IO =	 52.5 Mbytes/s
		   2 kbytes:	 19.8 usec/IO =	 98.6 Mbytes/s
		   4 kbytes:	 15.6 usec/IO =	249.7 Mbytes/s
		   8 kbytes:	 18.0 usec/IO =	434.8 Mbytes/s
		  16 kbytes:	 24.7 usec/IO =	632.3 Mbytes/s
		  32 kbytes:	 35.3 usec/IO =	884.4 Mbytes/s
		  64 kbytes:	 49.6 usec/IO =   1258.9 Mbytes/s
		 128 kbytes:	 84.0 usec/IO =   1488.1 Mbytes/s
		 256 kbytes:	148.2 usec/IO =   1687.4 Mbytes/s
		 512 kbytes:	274.7 usec/IO =   1820.2 Mbytes/s
		1024 kbytes:	530.1 usec/IO =   1886.5 Mbytes/s
		2048 kbytes:   1046.5 usec/IO =   1911.1 Mbytes/s
		4096 kbytes:   2060.7 usec/IO =   1941.1 Mbytes/s
		8192 kbytes:   4072.1 usec/IO =   1964.6 Mbytes/s


That was the best of 3 runs, but the 8k speeds were 1964.6, 1961.3, and 1957.1. Not a huge difference, but up 40-50MB/s from previous results. The system isn't idle either. I have 3 VM's running on ESXi using NFS data stores.
 

svtkobra7

Patron
Joined
Jan 12, 2017
Messages
202
I'll see what I can do to optimise things further - probably just pinning the VM to the CPU with the Optanes.

Any chance you could go into a bit more detail as to how you (a) pin the cores, determining which to pin and (b) the implications of not doing so???

I've long struggled with the "optimal" PCIe slots to plug 2 Optane NVMe drives into and suspected that it actually matters. To wit, in my scenario, CPU1 = PCI-e slots 1-3 and CPU 2 = slots 4-7, and the limited amount of knowledge on the topic I have tells me that I would want both Optane drives on slots controlled by the same CPU. Even though @ 8 GT/S the QPI isn't going to be a bottleneck, if the devices need to speak to one another are connected to different CPUs, they must traverse the QPI, increasing latency. That is my personal un-intelligent supposition and from far wiser folk, I've both hear that it does and does not matter. So if it does matter, and I want them both on the CPU, then I have to make a sub-optimal decision: (a) prioritize cooling (better on CPU2, slot 1 at the edge is a hot mess) or their relative position to SAS2208 (CPU1), which they certainly also must speak to. So if you had to press me for where to position my two drives, I would say PCIe Slots 1-3, (picking two), as the SAS2208 is on the same CPU. I believe you are alluding to a similar concept in your post.

Assuming I'm directly strolling down the right path, let's bring that full circle to your post. I'm interested both academically and selfishly to optimize my chassis. I more likely than not was quite off with that prior assertion and expect to be moreso here: I had always thought with my FreeNAS VM [a] consumed available system resources such that it automagically assigned threads to both cores, and there was no need to pin:
[a] FreeNAS [2 sockets, 4 cores per socket, and 224 GB of memory (87.5%)
ESXi-Host [2 sockets, 10 cores per socket, 40 threads], and with all DIMM slots being full, 16 x 16 GB, a configuration with reserved]
Is that a correct assumption?

Also, on a related topic, whenever I assign X CPUs to a new VM, it defaults to one core per socket, and X sockets. While a silly reason, I thought internally, "well, since I have 2 sockets, I should reverse that, and assign X/2 cores per socket such that sockets always = 2.

Block Diagram presented below to be of assistance and thanks in advance.

block.jpg
 

svtkobra7

Patron
Joined
Jan 12, 2017
Messages
202
Check this out. I changed out the CPU (E5-2660 v2 -> E5-2637 v2) in my primary FreeNAS, and now I am getting a little bit higher results on the bigger block size tests.

Thats like free NVMe - can't beat that! :):):)

Question for you, sir: I see from your sig you have 2 x 900ps per system (myself as well), (a) how do you have them passed through, and (b) how are you using them? If you presented somewhere in a post, I wouldn't take offense to a link. ;)

Edit below:

Looking at your disk description, you must have performed the pass through workaround.
Comment: I tried that route and it worked briefly and then crashed. So I suppose I need to either use vDisks or RDM.
 
Last edited:
Joined
Dec 29, 2014
Messages
1,135
Question for you, sir: I see from your sig you have 2 x 900ps per system (myself as well), (a) how do you have them passed through, and (b) how are you using them?

Both my FreeNAS systems are bare metal. The VM's all run on ESXi, and I have 4 separate hosts for that. Each of my FreeNAS systems has a single Optane 900P. In the backup system I manually partitioned it so I could use a part of the drive as an SLOG for different pools. You can't do that from the GUI, and you have to manually add the SLOG to the pool from the CLI as well. I did remove the SLOG from the pool prior to running the tests. I added it back in afterwards. Without the SLOG, my NFS write performance is kind of crappy.
 

svtkobra7

Patron
Joined
Jan 12, 2017
Messages
202
Thanks for your prompt reply. Thats a LOT of servers! I see you like the 9207-8i ... I managed to flash my board's integrated 2208 into a 9207-8i and put it in IT mode. No need for an HBA for me (and it has been working without issue for quite some time).

So I was thinking of this to fit my use case:

(gptid = illustrative)

nvme0 [also esxi / freenas boot]
gpart add -i 1 -t freebsd-zfs -s 20g nvd0
gptid/opt.slog1
gpart add -i 2 -t freebsd-zfs -s 240g nvd0
gpitd/opt.nfs1

nvme1
gpart add -i 1 -t freebsd-zfs -s 20g nvd1
gptid/opt.slog2
gpart add -i 2 -t freebsd-zfs -s 240g nvd1
gptid/opt.nfs2

Where the means to the end = ESXi datastores passed through as virtual disks (for lack of a better alternative). And I end up with:
  • Storage Pool = Existing HDDs + opt.slog1 + opt.slog2 (from above) [20GB + 20GB, striped]
  • New NFS Pool = opt.nfs1 + opt.nfs2 (from above) [240GB + 240GB, striped]

Why?
  • Testing yesterday showed striped slogs too be much more performant than 1 and I got within striking distance of sync=disabled no slog with sync=always with slog / while 1 Optane should be able to handle 2 slogs, I think there is some merit to "load balancing" (if you will)
  • Regarding NFS / iSCSI, why not put the balance of the drive to use (I could effectively over-provision alternatively). Testing also show solid performance as an NFS datastore, striped accross 2 Optane drives. I know there is much better stuff to do with Optane, but cache is out (encrypted), I'm already booting off it, etc. I know 480 GB isn't much, but enough for my needs and I also plan to mirror the config in the second host.
@Stux 's most excellent tutorial used one of the partitions for swap, but I believe the need for that has been obsolesced, right? I've been using his tutorial contained guidance to include this command as a postinit (for a number of months):

swapoff -a ; grep -v -E 'none[[:blank:]]+swap[[:blank:]]' /etc/fstab > /etc/fstab.new && echo "md99 none swap sw,file=/usr/swap0,late 0 0" >> /etc/fstab.new && mv /etc/fstab.new /etc/fstab ; swapon -aL


Bonkers? Unsafe? A waste of Optane? Curious for some feedback from a fellow Optane owner (and others that know more than I).
 
Joined
Dec 29, 2014
Messages
1,135

svtkobra7

Patron
Joined
Jan 12, 2017
Messages
202
FYI, all the info on splitting the SSD to share it is in a different thread. https://forums.freenas.org/index.ph...striped-ssds-partitioned-for-two-pools.62787/

Thanks, I must have had that cached im head (no pun intended) from reading it and forgot (far too creative a schema for me). I'll redirect there with some #s. 100% approrpiate placement - thank you.

I still wish I could explain this - https://forums.freenas.org/index.ph...inding-the-best-slog.63521/page-3#post-468132
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
such that it automagically assigned threads to both cores, and there was no need to pin:
[a] FreeNAS [2 sockets, 4 cores per socket, and 224 GB of memory (87.5%) [\b]
ESXi-Host [2 sockets, 10 cores per socket, 40 threads], and with all DIMM slots being full, 16 x 16 GB, a configuration with reserved
Is that a correct assumption?

Short answer: No.

Long answer:

ESXi tries to house all threads and memory from a VM on the same NUMA node (socket+memory) in order to avoid crossing the interlink unnecessarily. In your case, assigning 224/256GB of memory means you'll have to cross that link and 96GB of that VM's RAM will be on a remote node (with the associated latency)

But ESXi also won't care about which processor it runs those threads on, unless you force it:

Edit Settings -> Resources -> Advanced CPU -> Scheduling Affinity

Assuming you want it to run on pCPU0, set the field to "0-7" and put your Optane cards in the CPU1 PCIe slots.

Anyone using a virtual FreeNAS should be aware of this.
 
Last edited:

svtkobra7

Patron
Joined
Jan 12, 2017
Messages
202
Short answer: No.
How about: "sorta"??? ;) I mean clearly that is precisely what I was trying to say!

OK so VMware donesn't want to cross QPI to reach remote memory due to the expense, which is much more meaningful by a factor of ~1000 for memory vs a read on NVMe.

However, crossing the QPI to link to PCIe, still costs 40 ns + 120 ns + 40 ns round trip. Crossing the QPI for remote memory access is ~5% of the Optane latency, so I'm going to suggest that it should be avoided as 5% can't be perceived a single time (and at that threshold), but the hops start to add up and it is certainly measurable, possibly discernible. And that would be without interrupts or other things I know nothing about. So, et ceteris paribus, I'm going to say there is merit to containment on cpu1 if possible + PCIe fans to keep Optane alive in the SMCI airflow void from Hades (slot 1) ... but ...

I'm already crossing the QPI for the remote memory anyway, so whats the impact there? Removing the remote memory would result in a huge decrease in performance as compared to barely measurable impact for PCIe QPI traversal. No option there.

So in my case I'm sure remote memory hits and instructions sent to PCIe are occuring in the the same clock, but since we are working on a factor of 1000, I would assume that remote memory hit would be long complete by the time that PCIe read has occurred, and already traversing the QPI for remote memory wouldn't have a direct impact on discernible PCIe performance ... but ...

the processor isn't just doing two things, millions rather, so under load, maybe it hurts ...

My head does now.

I can't tease out certainty or not from your phrasing, but I'm guessing you know the answer, which I sorta got right again?

[I like the long explains btw = thanks] Econ was so worthless, we drew lines, if I had taken CS, I could have learned something.;)
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
Generally speaking, your storage/file protocol masks the latency of a NUMA traversal, and it's less noticeable that way. Reads are also more easily queued up versus an SLOG write which wants that absolute minimum of latency, and a definite "I'm finished" answer before it proceeds.

As you said, that QPI link has to contain all of the communication, remote memory access, PCIe slot access, etc - that goes between the two. And it can get clogged up which would increase latency.
 

Chris Tobey

Contributor
Joined
Feb 11, 2014
Messages
114
Samsung 850 Pro 120GB SSD
Code:
root@freenas:~# diskinfo -wS /dev/ada0p2
/dev/ada0p2
		512			 # sectorsize
		125888106496	# mediasize in bytes (117G)
		245875208	   # mediasize in sectors
		4096			# stripesize
		0			   # stripeoffset
		243923		  # Cylinders according to firmware.
		16			  # Heads according to firmware.
		63			  # Sectors according to firmware.
		Samsung SSD 850 PRO 128GB	   # Disk descr.
		S24ZNxxxxxxxxxF # Disk ident.
Synchronous random writes:
		 0.5 kbytes:   2545.1 usec/IO =	  0.2 Mbytes/s
		   1 kbytes:   2520.4 usec/IO =	  0.4 Mbytes/s
		   2 kbytes:   2519.0 usec/IO =	  0.8 Mbytes/s
		   4 kbytes:   2489.1 usec/IO =	  1.6 Mbytes/s
		   8 kbytes:   2724.7 usec/IO =	  2.9 Mbytes/s
		  16 kbytes:   3189.4 usec/IO =	  4.9 Mbytes/s
		  32 kbytes:   4157.7 usec/IO =	  7.5 Mbytes/s
		  64 kbytes:   4209.5 usec/IO =	 14.8 Mbytes/s
		 128 kbytes:   4364.3 usec/IO =	 28.6 Mbytes/s
		 256 kbytes:   6288.8 usec/IO =	 39.8 Mbytes/s
		 512 kbytes:   8713.3 usec/IO =	 57.4 Mbytes/s
		1024 kbytes:  10168.6 usec/IO =	 98.3 Mbytes/s
		2048 kbytes:  11147.0 usec/IO =	179.4 Mbytes/s
		4096 kbytes:  15262.1 usec/IO =	262.1 Mbytes/s
		8192 kbytes:  29195.3 usec/IO =	274.0 Mbytes/s

Intel Optane SSD DC P4800X PCIe NVMe
Code:
root@freenas:~ # diskinfo -wS /dev/nvd0
/dev/nvd0
		4096			# sectorsize
		375083606016	# mediasize in bytes (349G)
		91573146		# mediasize in sectors
		0			   # stripesize
		0			   # stripeoffset
		INTEL SSDPED1K375GA	 # Disk descr.
		PHKS7xxxxxxxxxxAGN	  # Disk ident.
Synchronous random writes:
		   4 kbytes:	 12.9 usec/IO =	302.0 Mbytes/s
		   8 kbytes:	 16.1 usec/IO =	485.7 Mbytes/s
		  16 kbytes:	 21.1 usec/IO =	741.6 Mbytes/s
		  32 kbytes:	 30.2 usec/IO =   1035.7 Mbytes/s
		  64 kbytes:	 49.1 usec/IO =   1272.4 Mbytes/s
		 128 kbytes:	 97.3 usec/IO =   1284.2 Mbytes/s
		 256 kbytes:	166.4 usec/IO =   1502.5 Mbytes/s
		 512 kbytes:	294.5 usec/IO =   1697.9 Mbytes/s
		1024 kbytes:	543.6 usec/IO =   1839.7 Mbytes/s
		2048 kbytes:   1063.5 usec/IO =   1880.6 Mbytes/s
		4096 kbytes:   2107.0 usec/IO =   1898.4 Mbytes/s
		8192 kbytes:   4231.5 usec/IO =   1890.6 Mbytes/s
 
Top