Mirroed SLOG - 2x optane p900 OR p900 and p3700 ?

SMnasMAN

Contributor
Joined
Dec 2, 2018
Messages
177
Currently i have a single optane p900 as a SLOG on a NFS share , connected to VMware. This is not a high disk IO load setup (as its my home lab), but i would like to have my SLOG as mirrored.

Can anyone offer any input on info on if i should buy another p900 as the 2nd part of the mirror, or if i should go with a p3700 as the 2nd mirror?

the answer would clearly be the p900 (as it has higher performance, and i would prefer mirror members to exactly match), however we all know that the p900 does not (exactly) have powerloss protection. (intel is vague on this, so i will assume it does NOT). The p3700 on the other hand, has slightly worse performance vs the p900, but DOES have PLP.

what do you guys think? (or is it a very bad idea to not use exact same devices as a mirror and thus my question is moot?)

thanks!

here are the current stats on the p900 in my x9 box (it is not assigned as a slog yet, in these test results)

with FreeNAS-11.2-U3:
Code:
Build     FreeNAS-11.2-U3
Platform     Intel(R) Xeon(R) CPU E5-2637 v2 @ 3.50GHz
Memory     262067MB
System Time     Sun, 14 Jul 2019 03:11:51 -0500
Uptime     3:11AM up 32 days, 22:34, 5 users
Load Average     0.25, 0.42, 0.43

Synchronous random writes:
         0.5 kbytes:     19.1 usec/IO =     25.6 Mbytes/s
           1 kbytes:     20.6 usec/IO =     47.3 Mbytes/s
           2 kbytes:     20.1 usec/IO =     97.1 Mbytes/s
           4 kbytes:     15.7 usec/IO =    249.2 Mbytes/s
           8 kbytes:     19.4 usec/IO =    403.6 Mbytes/s
          16 kbytes:     25.3 usec/IO =    618.7 Mbytes/s
          32 kbytes:     38.0 usec/IO =    823.1 Mbytes/s
          64 kbytes:     57.8 usec/IO =   1081.9 Mbytes/s
         128 kbytes:     97.8 usec/IO =   1277.8 Mbytes/s
         256 kbytes:    170.5 usec/IO =   1465.9 Mbytes/s
         512 kbytes:    300.8 usec/IO =   1662.5 Mbytes/s
        1024 kbytes:    561.1 usec/IO =   1782.1 Mbytes/s
        2048 kbytes:   1084.0 usec/IO =   1844.9 Mbytes/s
        4096 kbytes:   2114.6 usec/IO =   1891.6 Mbytes/s
        8192 kbytes:   4177.8 usec/IO =   1914.9 Mbytes/s


and with FreeNAS-11.2-U5 (which im currently running, and will continue to run):

Code:
Hostname     freenas.local
Build     FreeNAS-11.2-U5
Platform     Intel(R) Xeon(R) CPU E5-2637 v2 @ 3.50GHz
Memory     262067MB
System Time     Sun, 25 Aug 2019 15:40:19 -0500
Uptime     3:40PM up 1 day, 16:50, 4 users
Load Average     1.35, 1.62, 1.59

root@freenas:/mnt # diskinfo -wS /dev/nvd0
/dev/nvd0
        512             # sectorsize
        280065171456    # mediasize in bytes (261G)
        547002288       # mediasize in sectors
        0               # stripesize
        0               # stripeoffset
        INTEL SSDPED1D280GA     # Disk descr.
        PHMB74220077280CGN      # Disk ident.
        Yes             # TRIM/UNMAP support
        0               # Rotation rate in RPM

Synchronous random writes:
         0.5 kbytes:     18.4 usec/IO =     26.5 Mbytes/s
           1 kbytes:     19.0 usec/IO =     51.4 Mbytes/s
           2 kbytes:     19.9 usec/IO =     98.3 Mbytes/s
           4 kbytes:     15.2 usec/IO =    257.5 Mbytes/s
           8 kbytes:     18.6 usec/IO =    419.9 Mbytes/s
          16 kbytes:     25.0 usec/IO =    625.3 Mbytes/s
          32 kbytes:     36.4 usec/IO =    859.3 Mbytes/s
          64 kbytes:     52.2 usec/IO =   1197.3 Mbytes/s
         128 kbytes:     92.2 usec/IO =   1356.2 Mbytes/s
         256 kbytes:    161.0 usec/IO =   1553.2 Mbytes/s
         512 kbytes:    293.8 usec/IO =   1701.6 Mbytes/s
        1024 kbytes:    558.3 usec/IO =   1791.0 Mbytes/s
        2048 kbytes:   1073.4 usec/IO =   1863.2 Mbytes/s
        4096 kbytes:   2124.3 usec/IO =   1883.0 Mbytes/s
        8192 kbytes:   4233.9 usec/IO =   1889.5 Mbytes/s
 

jenksdrummer

Patron
Joined
Jun 7, 2011
Messages
250
If you're going to skip PLP, might as well not bother with a SLOG; since the intent of a SLOG is to basically force the writes to be considered reliable. RAM is not reliable. If your SLOG is also no reliable, then why not stick with much faster RAM?

Also, a SLOG will be slower than writing to RAM+Spindles, once you have a certain number of spindles; as your bandwidth is reduced to effectively to the max of your 1x SLOG device (considering it's being mirrored).



Here's my (bigger) system using DD

sync writes off/compression off!

Write: dd if=/dev/zero of=/mnt/zfs.pool.z60/speedtest/test.dat bs=2048k count=10000
10000+0 records in
10000+0 records out
20971520000 bytes transferred in 23.395022 secs (896409484 bytes/sec)

Read: dd of=/dev/null if=/mnt/zfs.pool.z60/speedtest/test.dat bs=2048k count=10000
10000+0 records in
10000+0 records out
20971520000 bytes transferred in 5.359299 secs (3913108641 bytes/sec)

Sync writes on, after deleting the file; no other changes:
Write: dd if=/dev/zero of=/mnt/zfs.pool.z60/speedtest/test.dat bs=2048k count=10000
10000+0 records in
10000+0 records out
20971520000 bytes transferred in 83.695268 secs (250569960 bytes/sec)

Read: dd of=/dev/null if=/mnt/zfs.pool.z60/speedtest/test.dat bs=2048k count=10000
10000+0 records in
10000+0 records out
20971520000 bytes transferred in 5.519853 secs (3799289755 bytes/sec)

Specs:
SLOG: 2* Intel DC 3200 SATA 240GB (not as fast as PCIe or NVMe, but has PLP!)
Spindles: 12* WD 10TB Gold, 2 VDEV Z2
Cache: Samsung M.2 NVMe 1TB 970 Pro
CPU: Intel Silver 4210
RAM: 192GB ECC 2400
Controller: Supermicro 3108 in JBOD Mode
Board: Supermicro X11-something...


Again, my point about the SLOG is that if you're chasing speed, that's not where to bother with it. Make those cache devices and enjoy the faster access for data that is cached. If you're chasing reliability, chase something that has PLP and the enjoy trade-off. Bonus that the cache is not mirrored, but striped; so more disks (more or less) = faster cached data access; and if you drop a cache disk, there's not as much risk you'll drop the whole array. SLOG...well...from what I've read that's not as much of an issue...but as it's an integral stepping stone for data being written, I imagine a drop of a SLOG would be detrimental...
 

SMnasMAN

Contributor
Joined
Dec 2, 2018
Messages
177
thanks for the reply / great info.

this is for a HDD pool, being used as a NFS share on vmware - so i do need a slog, and i want it to be a mirrored slog.
(fn box has 256g ram)

so question is just (in terms of that slog)-
(the slog is currently 1x p900 optane)

so do i go with:
2x p900 optanes , or do i use a p900 and a p3700. (i have an extra 400gb pcie p3700 sitting around, the p900 i would have to buy, but im fine with buying another one).

p3700 has PLP
p900 does not.

thanks!
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
Optane by design is direct-to-memory writes; there is no volatile DRAM buffer even on the tiny 16/32GB M.2 consumer "Optane Memory" cards.

The 900p is a perfectly safe SLOG; just be aware that Intel won't honor your warranty if they discover it was used in a "server" environment.

Additionally, the Product will not be subject to this Limited Warranty if used in:
(i) any compute, networking or storage system that supports workloads or data needs of more than one concurrent user or one
or more remote client device concurrently;
(ii) any server, networking or storage system that is capable of supporting more than one CPU per device; or
(iii) any device that is designed, marketed or sold to support or be incorporated into systems covered in clauses (i) or (ii).
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
With regard to your question about using two different types of SLOG device; the desire to have them be identical is generally more around the idea of uniform and predictable performance, as well as only needing to have a single model on hand for replacement later on. Given the relatively similar performance characteristics of the two devices, you shouldn't see any major ill effects. I might suggest overprovisioning the P3700 down to a smaller size just to help with its endurance.

The third option would be "buy a second 900p for mirrored SLOG, and use the P3700 for L2ARC, because 256GB of RAM is well into the point where you will have meaningful content in ARC to load onto it." But that depends on available PCIe slots.
 

jenksdrummer

Patron
Joined
Jun 7, 2011
Messages
250
The third option would be "buy a second 900p for mirrored SLOG, and use the P3700 for L2ARC, because 256GB of RAM is well into the point where you will have meaningful content in ARC to load onto it." But that depends on available PCIe slots.

I find this statement interesting; it's my understanding that ZFS flushes RAM to disk every 5 sec (or less); so what stays in ARC would be cached data until that data ages out by the caching algorithm (least recently used, I believe; vs FIFO). I found this to be consistent when I took my NVMe cache disk offline, that until my data churn was enough to fill ARC with "recent" data that needed to be read, and reads were forced to come from spindles instead of RAM; that's when performance suffered; but writes never really suffered.

In theory, with a 40gbit NIC, there would be 25GB of data in RAM per 5 second segment. Every step from RAM to the spindles can handle that bandwidth with ease; but once to the spindles, you'd basically need to stripe that data across 20 disks assuming an actual 125MB/sec transfer rate per disk (could be more than that, but I recall that being about par for SATA disks). That's not considering the different sized workloads that could shift that either higher in terms of pure IOPS or in terms of pure throughput, which real workload would have. Copy files over a network and watch the bandwidth tank as the files get smaller, but hit some large uncompressible data and it'll peg out most networks :)

Realistically, if I was looking at having to handle that much data throughput, I'd not even consider FreeNAS as a solution as it would mean my ability to feed my family would depend on that solution.

So on to my lab system...I've got a ridiculous amount of money dumped into it, even by my own equally ridiculous spending standards; with my justification is that some day, I may turn my lab into a side-business; ideally within the next year or two. Maybe. MAYBE. I have 4x 10gb ports on both of my FreeNAS boxes; 2x for iscsi storage, one 10gb for replication to another FreeNAS, and the 4th for management. (2nd box is much smaller; used for replication for now)

The only time I've *ever tapped out a single interface was during testing with IOMeter when I first stood it up, and setting it to test with some ridiculously large datasets just to see it peg out the NIC throughput. IOPS wise, the best I got was fairly mediocre compared to the QNAP I have been using before this box. TBH and IMO, ZFS sucks for performance once you get it outside of RAM ARC and all throughput tests when the test file is cached in RAM is a false idea of what the underlying system can do in terms of performance (my test file was 250GB vs RAM 192GB). It's OK until then, but it's still not as fast as an MD Array using EXT4. With a smaller test file, once it's in RAM it's just as fast as RAM on a network cable will allow...enter the L2ARC, which ideally is faster than your spindles, but never as fast as RAM (see experimental posts about setting up a RAMDrive for L2ARC), or when it comes to writes, when that crazy workload hits and you need the SLOG to take the brunt of it until the workload diminishes enough for the spindle subsystem to eat it up....

So back to the point I was attempting to make; in what scenario would you envision the scenario quoted? Not being hostile or anything of the sort; I'm curious! :)
 

jenksdrummer

Patron
Joined
Jun 7, 2011
Messages
250
Apologies on the model I have for my SLOG: it's an Intel D3-S4510 240GB, I was thinking it was a different model when I posted but thought I'd check.

Not the most write endurance, nor the fastest, but also not terribly expensive, has PLP, and relatively fast. Only notice it when I'm moving VM workloads from storage A to storage B, and that could be my lesser-sized storage B box more than the SLOG slowing it.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
So back to the point I was attempting to make; in what scenario would you envision the scenario quoted? Not being hostile or anything of the sort; I'm curious! :)
I'm still trying to pick out the question here. Do you mean the scenario where L2ARC is actually useful and viable? The short answer on that is "when your working set is too large for RAM, and pool read performance is inadequate." Adding more RAM is always the better option, but eventually you run out of DIMM slots.

There's two queues for ARC, MRU and the MFU - most recently and most frequently used. To oversimplify, data "ages out" from the tail end of those queues into L2ARC, and is put there based on the feed rate tunables (which are hilariously low out of the box, 8MB/s isn't nearly enough). Other things like metadata can live there as well (several users with a large number of files on a spinning-disk pool have seen benefits from using L2ARC for metadata only) - but it really boils down to it being a chunk of records that can be read faster than your vdevs but slower than ARC.

With the dropping cost of NAND nowadays though, anyone who's chasing big performance numbers is rightfully being steered towards an all-flash setup.
 
Top