Register for the iXsystems Community to get an ad-free experience and exclusive discounts in our eBay Store.

NVDIMMS and TrueNAS 12

Western Digital Drives - The Preferred Drives of FreeNAS and TrueNAS CORE

Rand

Neophyte Sage
Joined
Dec 30, 2013
Messages
891
How would an NVDIMM compare to an Optane Persistent RAM performance be..?
In fact, what's the performance (comparatively) between Optane Persistent RAM vs say a P4800X or 905p ..?
NVDimm -N > NVDimm -P > P4800X
Only tested first Gen Optane Mem though, but that was significantly slower then -N.

With 300's things might be better, but i doubt that it will reach -N's perf levels.

Here are the benches (4800X, NVDimm-N from https://www.truenas.com/community/t...nding-the-best-slog.63521/page-11#post-540176 )

Code:
=== START OF INFORMATION SECTION ===
Model Number:                       INTEL SSDPE21K375GA
Serial Number:                      PHKE7510005K375AGN
Firmware Version:                   E2010324
PCI Vendor/Subsystem ID:            0x8086
IEEE OUI Identifier:                0x5cd2e4
Controller ID:                      0
Number of Namespaces:               1
Namespace 1 Size/Capacity:          375,083,606,016 [375 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            e4d25c 6a8a070100
Local Time is:                      Mon Jul 15 23:27:01 2019 PDT
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0007):   Security Format Frmw_DL
Optional NVM Commands (0x0006):     Wr_Unc DS_Mngmt
Maximum Data Transfer Size:         32 Pages

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +    18.00W       -        -    0  0  0  0        0       0

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -     512       8         2
 2 -     512      16         2
 3 -    4096       0         0
 4 -    4096       8         0
 5 -    4096      64         0
 6 -    4096     128         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning:                   0x00
Temperature:                        44 Celsius
Available Spare:                    100%
Available Spare Threshold:          0%
Percentage Used:                    0%
Data Units Read:                    522,565 [267 GB]
Data Units Written:                 4,452,989 [2.27 TB]
Host Read Commands:                 19,271,381
Host Write Commands:                107,530,152
Controller Busy Time:               25
Power Cycles:                       1,120
Power On Hours:                     2,320
Unsafe Shutdowns:                   1,076
Media and Data Integrity Errors:    0
Error Information Log Entries:      0

Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged



diskinfo -citvwS /dev/nvd0
512 # sectorsize
375083606016 # mediasize in bytes (349G)
732585168 # mediasize in sectors
0 # stripesize
0 # stripeoffset
INTEL SSDPE21K375GA # Disk descr.
PHKE7510005K375AGN # Disk ident.
Yes # TRIM/UNMAP support
0 # Rotation rate in RPM

I/O command overhead:
time to read 10MB block 0.010853 sec = 0.001 msec/sector
time to read 20480 sectors 0.432072 sec = 0.021 msec/sector
calculated command overhead = 0.021 msec/sector

Seek times:
Full stroke: 250 iter in 0.026141 sec = 0.105 msec
Half stroke: 250 iter in 0.011957 sec = 0.048 msec
Quarter stroke: 500 iter in 0.018267 sec = 0.037 msec
Short forward: 400 iter in 0.016424 sec = 0.041 msec
Short backward: 400 iter in 0.018157 sec = 0.045 msec
Seq outer: 2048 iter in 0.060869 sec = 0.030 msec
Seq inner: 2048 iter in 0.046046 sec = 0.022 msec

Transfer rates:
outside: 102400 kbytes in 0.105981 sec = 966211 kbytes/sec
middle: 102400 kbytes in 0.090489 sec = 1131629 kbytes/sec
inside: 102400 kbytes in 0.111131 sec = 921435 kbytes/sec

Asynchronous random reads:
sectorsize: 1341848 ops in 3.000216 sec = 447250 IOPS
4 kbytes: 1343147 ops in 3.000109 sec = 447699 IOPS
32 kbytes: 178116 ops in 3.002087 sec = 59331 IOPS
128 kbytes: 46179 ops in 3.008889 sec = 15348 IOPS

Synchronous random writes:
0.5 kbytes: 32.2 usec/IO = 15.2 Mbytes/s
1 kbytes: 32.4 usec/IO = 30.1 Mbytes/s
2 kbytes: 33.4 usec/IO = 58.4 Mbytes/s
4 kbytes: 25.8 usec/IO = 151.4 Mbytes/s
8 kbytes: 33.2 usec/IO = 235.3 Mbytes/s
16 kbytes: 42.2 usec/IO = 370.4 Mbytes/s
32 kbytes: 56.2 usec/IO = 556.2 Mbytes/s
64 kbytes: 86.7 usec/IO = 720.8 Mbytes/s
128 kbytes: 137.1 usec/IO = 911.6 Mbytes/s
256 kbytes: 215.2 usec/IO = 1161.6 Mbytes/s
512 kbytes: 360.2 usec/IO = 1388.0 Mbytes/s
1024 kbytes: 667.9 usec/IO = 1497.3 Mbytes/s
2048 kbytes: 1221.1 usec/IO = 1637.8 Mbytes/s
4096 kbytes: 2388.8 usec/IO = 1674.5 Mbytes/s
8192 kbytes: 4719.0 usec/IO = 1695.3 Mbytes/s


Code:
diskinfo -citvwS /dev/pmem0

512 # sectorsize
17179865088 # mediasize in bytes (16G)
33554424 # mediasize in sectors
0 # stripesize
0 # stripeoffset
PMEM region 16GB # Disk descr.
9548ADD1D6FC0231 # Disk ident.
No # TRIM/UNMAP support
0 # Rotation rate in RPM

I/O command overhead:
time to read 10MB block 0.002227 sec = 0.000 msec/sector
time to read 20480 sectors 0.026084 sec = 0.001 msec/sector
calculated command overhead = 0.001 msec/sector

Seek times:
Full stroke: 250 iter in 0.000439 sec = 0.002 msec
Half stroke: 250 iter in 0.000425 sec = 0.002 msec
Quarter stroke: 500 iter in 0.000830 sec = 0.002 msec
Short forward: 400 iter in 0.000622 sec = 0.002 msec
Short backward: 400 iter in 0.000692 sec = 0.002 msec
Seq outer: 2048 iter in 0.002606 sec = 0.001 msec
Seq inner: 2048 iter in 0.002542 sec = 0.001 msec

Transfer rates:
outside: 102400 kbytes in 0.014434 sec = 7094361 kbytes/sec
middle: 102400 kbytes in 0.013545 sec = 7559985 kbytes/sec
inside: 102400 kbytes in 0.013614 sec = 7521669 kbytes/sec

Asynchronous random reads:
sectorsize: 1867310 ops in 3.000057 sec = 622425 IOPS
4 kbytes: 1589498 ops in 3.000047 sec = 529824 IOPS
32 kbytes: 935622 ops in 3.000054 sec = 311868 IOPS
128 kbytes: 328937 ops in 3.001158 sec = 109603 IOPS

Synchronous random writes:
0.5 kbytes: 1.6 usec/IO = 299.9 Mbytes/s
1 kbytes: 1.7 usec/IO = 589.9 Mbytes/s
2 kbytes: 1.7 usec/IO = 1143.4 Mbytes/s
4 kbytes: 1.8 usec/IO = 2135.6 Mbytes/s
8 kbytes: 2.4 usec/IO = 3244.6 Mbytes/s
16 kbytes: 3.7 usec/IO = 4192.4 Mbytes/s
32 kbytes: 9.3 usec/IO = 3344.5 Mbytes/s
64 kbytes: 12.3 usec/IO = 5088.3 Mbytes/s
128 kbytes: 17.6 usec/IO = 7119.2 Mbytes/s
256 kbytes: 27.7 usec/IO = 9021.8 Mbytes/s
512 kbytes: 46.6 usec/IO = 10731.7 Mbytes/s
1024 kbytes: 84.4 usec/IO = 11853.0 Mbytes/s
2048 kbytes: 159.5 usec/IO = 12535.5 Mbytes/s
4096 kbytes: 314.3 usec/IO = 12726.1 Mbytes/s
8192 kbytes: 621.4 usec/IO = 12873.4 Mbytes/s 


NVDimm -P (copied from https://jira.ixsystems.com/browse/NAS-108510) @nasbdh9 's numbers

Code:
/dev/pmem0
512 # sectorsize
270582935552 # mediasize in bytes (252G)
528482296 # mediasize in sectors
0 # stripesize
0 # stripeoffset
PMEM region 252GB # Disk descr.
E60A9A2579EB399E # Disk ident.
No # TRIM/UNMAP support
0 # Rotation rate in RPM

I/O command overhead:
time to read 10MB block 0.002218 sec = 0.000 msec/sector
time to read 20480 sectors 0.016406 sec = 0.001 msec/sector
calculated command overhead = 0.001 msec/sector

Seek times:
Full stroke: 250 iter in 0.000500 sec = 0.002 msec
Half stroke: 250 iter in 0.000486 sec = 0.002 msec
Quarter stroke: 500 iter in 0.000997 sec = 0.002 msec
Short forward: 400 iter in 0.000766 sec = 0.002 msec
Short backward: 400 iter in 0.000822 sec = 0.002 msec
Seq outer: 2048 iter in 0.001842 sec = 0.001 msec
Seq inner: 2048 iter in 0.001862 sec = 0.001 msec

Transfer rates:
outside: 102400 kbytes in 0.013630 sec = 7512839 kbytes/sec
middle: 102400 kbytes in 0.013790 sec = 7425671 kbytes/sec
inside: 102400 kbytes in 0.013994 sec = 7317422 kbytes/sec

Asynchronous random reads:
sectorsize: 1425829 ops in 3.000028 sec = 475272 IOPS
4 kbytes: 864612 ops in 3.000029 sec = 288201 IOPS
32 kbytes: 945080 ops in 3.000400 sec = 314985 IOPS
128 kbytes: 164956 ops in 3.002401 sec = 54941 IOPS

Synchronous random writes:
0.5 kbytes: 2.0 usec/IO = 241.0 Mbytes/s
1 kbytes: 2.1 usec/IO = 466.8 Mbytes/s
2 kbytes: 2.3 usec/IO = 848.2 Mbytes/s
4 kbytes: 2.6 usec/IO = 1510.5 Mbytes/s
8 kbytes: 5.1 usec/IO = 1544.3 Mbytes/s
16 kbytes: 10.8 usec/IO = 1452.0 Mbytes/s
32 kbytes: 20.8 usec/IO = 1503.5 Mbytes/s
64 kbytes: 37.7 usec/IO = 1658.9 Mbytes/s
128 kbytes: 79.2 usec/IO = 1577.3 Mbytes/s
256 kbytes: 157.2 usec/IO = 1590.8 Mbytes/s
512 kbytes: 310.1 usec/IO = 1612.5 Mbytes/s
1024 kbytes: 665.0 usec/IO = 1503.7 Mbytes/s
2048 kbytes: 1364.2 usec/IO = 1466.0 Mbytes/s
4096 kbytes: 2800.3 usec/IO = 1428.4 Mbytes/s
8192 kbytes: 5639.6 usec/IO = 1418.5 Mbytes/s
 

TrumanHW

Member
Joined
Apr 17, 2018
Messages
111
Can't thoroughly examine your post (takes me time to digest and research) ... but for clarification, you mean in True or FreeNAS as either:
  • SLOG
  • L2ARC
  • MetaData Pool ...yes..?
Where do you draw the respective-lines of 'utility' when pairing these devices up between 'role & performance' ...?
Just takin' S.W.A.G. to give you a framework for my question; I still KNOW I don't know a damned thing.

As a SLOG:
  • NV-DIMM ..? I'm guessing ..? But, do they work in the free ver. of TrueNAS Core ..??
  • Optanes are either massively oversized or have far from adequate performance (M10 or M20 versions) ..?
  • Radian RMS-200 (8GB, though it seems the 16GB version would maybe more useful as 8GB may be exceeded) ...
  • ZeusRAM 8GB (Which are slower than a a Radian RMS-200 ... and costs like 10-20x as much ..???)
Order of Performance for L2ARC:
  • Optane Persistent RAM (100-series) though TrueNAS Core: says not supported
  • Optanes: smallest available in (presumably) this order: P5800X, P4800X, 905P, 900P, lastly, P4801X ..?
  • NV-DIMM: though I'm not sure the free versions of TrueNAS core supports this..?
TrueNAS Support for Optane Persistent DIMMs (100-series):
TrueNAS Core:Not supported
TrueNAS Scale:If supported, is it temporary until officially released..?

Thank you VERY much...

I also have some questions about dRAID to bounce off you...
Basically, how dRAID + ALL FLASH array will change the calculus of qty of parity drives ...
(That's assuming that ALL FLASH ARRAYS don't already get pretty close to changing that even with RAIDzX)
 

TrumanHW

Member
Joined
Apr 17, 2018
Messages
111
Note that using either:
  • DUAL CONTROLLER storage
  • NV-DIMMs
...are both listed as 'Enterprise Only'


TrueNAS Free vs Paid Features.png
 

Ericloewe

Not-very-passive-but-aggressive
Moderator
Joined
Feb 15, 2014
Messages
17,233
If you really wanted to, you could probably get some NVDIMM support working in Core. You'd be missing mostly the GUI/middleware bits.
 

Rand

Neophyte Sage
Joined
Dec 30, 2013
Messages
891
Can't thoroughly examine your post (takes me time to digest and research) ... but for clarification, you mean in True or FreeNAS as either:
  • SLOG
  • L2ARC
  • MetaData Pool ...yes..?
Where do you draw the respective-lines of 'utility' when pairing these devices up between 'role & performance' ...?
You know, all 3 device types have very different requirements (size, io performance, latency) that make them either a good or bad choice for their role.
NVDimm-N are only really worth it for SLOG due to the severely limited size (max 32G modules, limiting total memory) but there they shine as they are at near memory performance levels.
Both others are usable as SLOG and Metadata device (and also to a lesser extent as SLOG) since they have the necessary size (several 100 GB, although depending on your requirements you might want even larger devices for L2ARC).

But then we should be looking at the whole picture (HW) and especially the desired capabilities of your server (i.e. use case), since at this point we also need to consider actual pool hw (disks vs ssds vs nvmes) to see what is really worthwhile...


Just takin' S.W.A.G. to give you a framework for my question; I still KNOW I don't know a damned thing.

As a SLOG:
  • NV-DIMM ..? I'm guessing ..? But, do they work in the free ver. of TrueNAS Core ..??
  • Optanes are either massively oversized or have far from adequate performance (M10 or M20 versions) ..?
  • Radian RMS-200 (8GB, though it seems the 16GB version would maybe more useful as 8GB may be exceeded) ...
  • ZeusRAM 8GB (Which are slower than a a Radian RMS-200 ... and costs like 10-20x as much ..???)
You should have a look at https://www.truenas.com/community/threads/slog-benchmarking-and-finding-the-best-slog.63521/, many of the devices are listed there to give you an idea.
What is good (enough) and bad always depends on requirements, and while NVDIMM-N is *the* best SLOG I know [but have not been looking at newer stuff liek the 5800X], its not necessarily the best for your use case (compatibility, price, complexity, support...)

Optanes are cheap (900p, even 4800x's can be get for some 300 bucks used)
Radian RMS are not bad, they dont have the numbers *I* was looking for when i had a quick peek at the numbers in the SLOG thread the other day (caused by great deal at StH), but ymmv
Zeus - it had its time, they are over now.

Order of Performance for L2ARC:
  • Optane Persistent RAM (100-series) though TrueNAS Core: says not supported
  • Optanes: smallest available in (presumably) this order: P5800X, P4800X, 905P, 900P, lastly, P4801X ..?
  • NV-DIMM: though I'm not sure the free versions of TrueNAS core supports this..?
TrueNAS Support for Optane Persistent DIMMs (100-series):
TrueNAS Core:Not supported
TrueNAS Scale:If supported, is it temporary until officially released..?
I have not run NVDIMM-P on FreeNas, thats why I didnt provide my own numbers but @nasbdh9 's, i don't run L2Arc since I used to have massive amounts of memory (which I since changed, so it might be a thing now, but I run an all NVME pool so the expected gain is rather small).
You can't mix and match NVDIMM-N and -P on one board so based on your use case pick either.

I would assume they run as the primary concern is Bios and then general pmem support - @nasbdh9's numbers were 100's iirc and it was on TNC12 so at least at some point they did.

My -Ns still run fine on Core (as they did on 11), as @Ericloewe said they have added a framework around them to make management simpler (monitoring, firmware upgrades), that does not work with mine as I have no access to the firmware they use and as such the modules don't have the necessary capabilities. But they work fine as native pmem module (yes, they are persisting o/c).

I also have some questions about dRAID to bounce off you...
Basically, how dRAID + ALL FLASH array will change the calculus of qty of parity drives ...
(That's assuming that ALL FLASH ARRAYS don't already get pretty close to changing that even with RAIDzX)
Sorry, have not looked into draid at all yet, not my use case;)


Edit: When I look at https://sc18.supercomputing.org/proceedings/tech_poster/poster_files/post185s2-file2.pdf
then it does not seem it changes the # of parity drives at all (for a single vdev) but will provide increased performance for the same amount of drives.
The question should be if larger vdevs (> 12 drives per vdev) are possible with draid and if a draid vdev still has the single disk IOPS performance limitation ...
 
Last edited:

Rand

Neophyte Sage
Joined
Dec 30, 2013
Messages
891
Note that using either:
  • DUAL CONTROLLER storage
  • NV-DIMMs
...are both listed as 'Enterprise Only'
Dual Controller storage is basically depending on the hardware that is provided with the Enterprise Arrays. You can run your own, but it won't be easily managed via GUI/Framework. Not really usable/practical then unfortunately

NVDimms - mentioned above - run fine, perfectly ok for single box, no GUI/Framework
 

TrumanHW

Member
Joined
Apr 17, 2018
Messages
111
...you could probably get NVDIMM support working in Core. You'd lack the GUI.
AH, so this would basically be what using it in ZoL would be like ..? (assuming ZoL is all via CLI ..?)
Thus, the meaning of "unsupported" just refers to the GUI aspect, yes.?
 

Rand

Neophyte Sage
Joined
Dec 30, 2013
Messages
891
there is not much to do anyway - you enable them once, and have them as regular device in your system which you can handle similar to any other drive; Just add it as slog, l2arc or whatever you want and then you wont touch it any more.

Could have mirror'ed o/c but decided to use the slot for non persistent memory;)

1629579659343.png


What you dont have is health checks, monitoring and firmware updates
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
883
There's nothing in CORE that stops NVDIMMs working. Health checks, monitoring and firmware updates can be done from CLI
Enterprise adds management features and HA with webUI support which are tied to specific NVDIMM models that are used in TrueNAS M-Series. With HA, NVDIMM updates can be done while the system is operating. HA is quite complicated and requires a high performance PCIe bridge between the NVDIMMs.

AFAIK, Intel has not supported Optane DIMMs in FreeBSD 12.x... this is a pre-requisite for TrueNAS CORE.

Based on our own internal testing, SLOG size of 12-16GB is about optimum. Larger SLOGs and TXGs are more complicated and may slow down systems. For that reason, we haven't pursued Optane DIMMs. We don't see a lot of benefit.
 

Rand

Neophyte Sage
Joined
Dec 30, 2013
Messages
891
Hi @morganL,

thanks for chiming in and sharing that insight, quite interesting.

When you say that health checks and monitoring should work from cli - is that limited to the iX nvdimms (firmware support) or should it work with non iX modules too?

There is basically no documentation on the Hub re NVDimms so I did not find much to work with, I o/c found ixnvdimm, but that's not doing anything on my modules...

Code:
root@freenas[~]# ixnvdimm
usage: ixnvdimm {nvdimm}
       ixnvdimm -F {nvdimm}
       ixnvdimm -d {nvdimm}
       ixnvdimm -f {firmware} {nvdimm}
       ixnvdimm -r [-h] {nvdimm} [{page} [{off}] | {reg}]
       ixnvdimm -w [-h] {nvdimm} ({page} {off} | {reg}) {val}
root@freenas[~]# man ixnvdimm
No manual entry for ixnvdimm
root@freenas[~]# ixnvdimm /dev/pmem0
ixnvdimm: Can't get info from /dev/pmem0: Inappropriate ioctl for device
root@freenas[~]# ixnvdimm /dev/pmem0p1
ixnvdimm: Can't get info from /dev/pmem0p1: Inappropriate ioctl for device


Also, are there any special tweaks that you set for nvdimm based boxes that you can share (eg TXG size)? ;)
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
883
You would use the FreeBSD tools for the vendor you have. .. we don;t test or support generic NVDIMMs, the technlogy is too complex for CORE. We would recommend PCIe/NVME to most CORE users.

The 12-16GB SLOG size and 5s timeout dictates the TXG size... that works well for most workloads. Anything else requires its own testing for a specific workload.
 

nasbdh9

Junior Member
Joined
Oct 23, 2020
Messages
12
Because of the lack of DAX support in freebsd, the actual performance of NVDIMM-N and NVDIMM-P (Optane Persistent Memory) are not as expected

If your ZFS pool needs absolute synchronous writing, then I recommend you to try P5800X, In freebsd, you will get larger capacity and similar performance to NVDIMM
 

Rand

Neophyte Sage
Joined
Dec 30, 2013
Messages
891
You would use the FreeBSD tools for the vendor you have. .. we don;t test or support generic NVDIMMs, the technlogy is too complex for CORE. We would recommend PCIe/NVME to most CORE users.

The 12-16GB SLOG size and 5s timeout dictates the TXG size... that works well for most workloads. Anything else requires its own testing for a specific workload.
Ah ok, thought so - just wondered since you had said the monitoring would work on cli level, and I never found to get much data out oif them.
O/c all the management aspects exist; but worst case these (module configuration) carry over from linux as well, so thats not a big deal.


Because of the lack of DAX support in freebsd, the actual performance of NVDIMM-N and NVDIMM-P (Optane Persistent Memory) are not as expected

If your ZFS pool needs absolute synchronous writing, then I recommend you to try P5800X, In freebsd, you will get larger capacity and similar performance to NVDIMM
Do you have diskinfo results for the P5800X?

I don't have a PCIe 4 box at this point (intending to wait another year or two till 5 becomes a thing), but would be quite interested to see if the 5800 really matches NVDimms (-N) on a 64K blocksize...

And does Scale have DAX support? Might be a compelling reason to switch - in addition to RDMA traffic o/c
 

TrumanHW

Member
Joined
Apr 17, 2018
Messages
111
...HA is quite complicated and requires a high performance PCIe bridge between the NVDIMMs.
Sorry .... what are these two abbreviations..?

HA

(I'd asked but reading further I see what I can do to look it up
TXGs (i see, this is a setting or performance metric of using an NV-DIMM (sorry)


Thanks.
 
Last edited:

Ericloewe

Not-very-passive-but-aggressive
Moderator
Joined
Feb 15, 2014
Messages
17,233
High Availability. A TXG is a transaction group, which is the group of data that gets written to the pool at once. Data is gathered for a TXG in the ZIL until it is flushed to the pool.
 
Top