Very slow writes on mirrored vdev SSD Pool

notrox

Cadet
Joined
Jan 10, 2021
Messages
9
System specifications:

Chassis: SuperMicro CSE-847BE1C-R1K28LPB
CPU: 2 x Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
Motherboard: Super X10DRi-T4+
Memory: 128GB ECC DDR4
HBA: LSI SAS3008
Pool1: 10 x 14TB Seagate (ST14000NM0288) 12GB/s SAS Drives
SSD-Pool: 12 x 2TB Leven (JAJS600M2TB) 6GB/s SATA Drives

I am having some issues with very slow write speeds to my 12 disk 6 mirrored vdev SSD Pool.

root@FS[/]# zpool list -v SSD-Pool
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
SSD-Pool 11.2T 65.5G 11.1T - - 0% 0% 1.00x ONLINE /mnt
mirror-0 1.86T 11.3G 1.85T - - 0% 0.59% - ONLINE
gptid/a672ba8a-4f04-11ed-89ac-ac1f6b8aacc4 - - - - - - - - ONLINE
gptid/9a7f158c-4f04-11ed-89ac-ac1f6b8aacc4 - - - - - - - - ONLINE
mirror-1 1.86T 9.99G 1.85T - - 0% 0.52% - ONLINE
gptid/a6587cce-4f04-11ed-89ac-ac1f6b8aacc4 - - - - - - - - ONLINE
gptid/a78122d6-4f04-11ed-89ac-ac1f6b8aacc4 - - - - - - - - ONLINE
mirror-2 1.86T 10.2G 1.85T - - 0% 0.53% - ONLINE
gptid/a63c1cbf-4f04-11ed-89ac-ac1f6b8aacc4 - - - - - - - - ONLINE
gptid/a1c9e504-4f04-11ed-89ac-ac1f6b8aacc4 - - - - - - - - ONLINE
mirror-3 1.86T 13.6G 1.85T - - 0% 0.71% - ONLINE
gptid/a0c86f05-4f04-11ed-89ac-ac1f6b8aacc4 - - - - - - - - ONLINE
gptid/a61f0003-4f04-11ed-89ac-ac1f6b8aacc4 - - - - - - - - ONLINE
mirror-4 1.86T 12.5G 1.85T - - 0% 0.65% - ONLINE
gptid/a92ea93f-4f04-11ed-89ac-ac1f6b8aacc4 - - - - - - - - ONLINE
gptid/a68bfba3-4f04-11ed-89ac-ac1f6b8aacc4 - - - - - - - - ONLINE
mirror-5 1.86T 7.96G 1.85T - - 0% 0.41% - ONLINE
gptid/a6a66341-4f04-11ed-89ac-ac1f6b8aacc4 - - - - - - - - ONLINE
gptid/a798fc7d-4f04-11ed-89ac-ac1f6b8aacc4 - - - - - - - - ONLINE

# Write Test #

root@FS[/mnt/SSD-Pool]# fio --directory=/mnt/SSD-Pool --name=write_test --rw=write --bs=1M --size=10G
write_test: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
fio-3.28
Starting 1 process
write_test: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=42.2MiB/s][w=42 IOPS][eta 00m:00s]
write_test: (groupid=0, jobs=1): err= 0: pid=21604: Tue Oct 18 13:28:13 2022
write: IOPS=55, BW=55.2MiB/s (57.8MB/s)(10.0GiB/185608msec); 0 zone resets
clat (usec): min=199, max=64236, avg=18070.67, stdev=7978.17
lat (usec): min=206, max=64292, avg=18121.06, stdev=7992.63
clat percentiles (usec):
| 1.00th=[ 206], 5.00th=[ 247], 10.00th=[ 2671], 20.00th=[13173],
| 30.00th=[19530], 40.00th=[19792], 50.00th=[20055], 60.00th=[20579],
| 70.00th=[21103], 80.00th=[22152], 90.00th=[25035], 95.00th=[28181],
| 99.00th=[34866], 99.50th=[38011], 99.90th=[45876], 99.95th=[50070],
| 99.99th=[60031]
bw ( KiB/s): min=28054, max=1637173, per=100.00%, avg=56669.51, stdev=89194.88, samples=364
iops : min= 27, max= 1598, avg=54.76, stdev=87.10, samples=364
lat (usec) : 250=5.06%, 500=0.34%, 750=0.01%
lat (msec) : 2=3.17%, 4=3.52%, 10=5.39%, 20=25.71%, 50=56.74%
lat (msec) : 100=0.06%
cpu : usr=0.38%, sys=3.25%, ctx=77624, majf=0, minf=0
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,10240,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
WRITE: bw=55.2MiB/s (57.8MB/s), 55.2MiB/s-55.2MiB/s (57.8MB/s-57.8MB/s), io=10.0GiB (10.7GB), run=185608-185608msec

# Read Test #

root@FS[/mnt/SSD-Pool]# fio --directory=/mnt/SSD-Pool --name=read --rw=read --bs=1M --size=10G
read: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
fio-3.28
Starting 1 process
read: Laying out IO file (1 file / 10240MiB)

Jobs: 1 (f=1): [R(1)][100.0%][r=2092MiB/s][r=2091 IOPS][eta 00m:00s]
read: (groupid=0, jobs=1): err= 0: pid=21780: Tue Oct 18 13:35:27 2022
read: IOPS=2257, BW=2258MiB/s (2368MB/s)(10.0GiB/4535msec)
clat (usec): min=250, max=101389, avg=441.99, stdev=1437.49
lat (usec): min=250, max=101389, avg=442.07, stdev=1437.50
clat percentiles (usec):
| 1.00th=[ 285], 5.00th=[ 285], 10.00th=[ 289], 20.00th=[ 293],
| 30.00th=[ 297], 40.00th=[ 306], 50.00th=[ 330], 60.00th=[ 461],
| 70.00th=[ 490], 80.00th=[ 510], 90.00th=[ 635], 95.00th=[ 701],
| 99.00th=[ 955], 99.50th=[ 1270], 99.90th=[ 2212], 99.95th=[ 3097],
| 99.99th=[89654]
bw ( MiB/s): min= 1500, max= 3315, per=100.00%, avg=2304.27, stdev=698.38, samples=8
iops : min= 1500, max= 3315, avg=2304.00, stdev=698.23, samples=8
lat (usec) : 500=74.58%, 750=23.41%, 1000=1.15%
lat (msec) : 2=0.70%, 4=0.13%, 100=0.02%, 250=0.01%
cpu : usr=0.33%, sys=94.24%, ctx=179, majf=0, minf=257
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=10240,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
READ: bw=2258MiB/s (2368MB/s), 2258MiB/s-2258MiB/s (2368MB/s-2368MB/s), io=10.0GiB (10.7GB), run=4535-4535msec

Any help would be greatly appreciated.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
SSD-Pool: 12 x 2TB Leven (JAJS600M2TB) 6GB/s SATA Drives
I've never heard of this brand - do you have a manufacturer link or spec sheet?

~58MB/s is indeed very poor. I suspect the SSDs are QLC and/or has no DRAM, but even that is the kind of metrics I'd expect from a single SSD, not twelve if them in mirrors.

Are you using compression, or deduplication on the dataset you're testing against?

Is it possible to run the fio command with the --eta-newline=10s parameter to see if it starts off quick and then rapidly crashes down to that slow speed?
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,949
A discussion on reddit seems to indicate that some of these are cheap QLC and no DRAM on drives of this size.
Also: https://www.amazon.com/dp/B08DTMYDR9?ref=myi_title_dp&th=1
Doesn't even mention drives > 1TB OR the JA model number

I also found another discussion that seems to indicate that performance tanks rapidly with use.

Got all sorts of warning signals from a minimum of research
 

notrox

Cadet
Joined
Jan 10, 2021
Messages
9
Looks like I didn't do enough research before purchasing these drives. What would we be suitable replacement 2TB SSD?

They will be used for VMs running on VMware.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,949
As you are in the USA - you probably have better choices than most
Question - is this home use, commercial use or commercial lab use as 24TB raw storage for VM's is an awful lot of VM's. Even if mirrored (now 12TB) and below 50% (now 6TB)

6TB of actual VM's is still an awful lot for home use.

AN example of what I would consider. Ebay Link to used SSD
These are mixed load SSD's. Enterprise grade with (according to the vendor) 100% use left. These drives are 5 DWPD which is an awful lot of data for VM's so it depends on your use case

Ahh - just spotted they are U.2 - not such a good idea - Damn
 

notrox

Cadet
Joined
Jan 10, 2021
Messages
9
I am located in the USA. This is for home use for my personal lab. I currently have close to 40 VMs of various sizes.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,949
Can I assume you have a good SLOG - or are using async writes?

I would try and find some used enterprise SSD's - but then make sure they have a reasonable DWPD. For example
Intel DC35xx - DWPD 1, these were marketed as read intensive drives
Intel DC36xx - DWPD 5, these were marketed as mixed load
Intel DC37xx - DWPD 10, these were marketed as write intensive

I use 3 mirrored pairs of DC36xx 1.6TB for my iSCSI pool for vmware. I also use an optane 900p as a SLOG on the pool

These are the ones I know about - there are others - just be careful to catagorise them by DWPD

In general enterprise drives will just perform correctly as per their specs and will keep performing until they burn out. Consumer drives OTOH are a crap shoot. Some are utter shit (Samsung QVO, and apparently Leven) and some seem to perform quite well (I had reasonable success with MX500's from Crucial) in an iSCSI type use case.

Some users have reported a drive write multiplication bug with MX500's. I imagine this has been fixed in firmware by now. I didn't get hit by it. Also there is a bug that randomly reports a bad sector - but the pool is still good. There is a workaround, but its not a good one.
 

notrox

Cadet
Joined
Jan 10, 2021
Messages
9
Can I assume you have a good SLOG - or are using async writes?

I would try and find some used enterprise SSD's - but then make sure they have a reasonable DWPD. For example
Intel DC35xx - DWPD 1, these were marketed as read intensive drives
Intel DC36xx - DWPD 5, these were marketed as mixed load
Intel DC37xx - DWPD 10, these were marketed as write intensive

I use 3 mirrored pairs of DC36xx 1.6TB for my iSCSI pool for vmware. I also use an optane 900p as a SLOG on the pool

These are the ones I know about - there are others - just be careful to catagorise them by DWPD

In general enterprise drives will just perform correctly as per their specs and will keep performing until they burn out. Consumer drives OTOH are a crap shoot. Some are utter shit (Samsung QVO, and apparently Leven) and some seem to perform quite well (I had reasonable success with MX500's from Crucial) in an iSCSI type use case.

Some users have reported a drive write multiplication bug with MX500's. I imagine this has been fixed in firmware by now. I didn't get hit by it. Also there is a bug that randomly reports a bad sector - but the pool is still good. There is a workaround, but its not a good one.

I was planning on getting an Optane drive to use for SLOG on the pool. I've been looking at https://www.newegg.com/intel-optane-ssd-900p-series-280gb/p/N82E16820167437

Thank you very much for the recommendations on those enterprise SSD's.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,949
If you do get an Optane - get an U.2 one or NVMe (but not the 16/32GB HDD accelerator ones). The one you linked to uses a whole slot. If you use U,2 or NVMe you can fit more int one slot (Bifurcation or PCIe Switch dependant of course)
The 4800x are better SLOGs as they definately have PLP
 

notrox

Cadet
Joined
Jan 10, 2021
Messages
9
If you do get an Optane - get an U.2 one or NVMe (but not the 16/32GB HDD accelerator ones). The one you linked to uses a whole slot. If you use U,2 or NVMe you can fit more int one slot (Bifurcation or PCIe Switch dependant of course)
The 4800x are better SLOGs as they definately have PLP

That makes sense. I'm having a hard time finding NVMe M.2 Optane drives that aren't the 16/32GB HDD accelerator ones. I'm only seeing U.2 and the pci-e full cards. I have a 2 slot pci-e x8 Gen3 card that I can use in 4x4 bifurcation for 2 M.2 NVMe drives.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
That makes sense. I'm having a hard time finding NVMe M.2 Optane drives that aren't the 16/32GB HDD accelerator ones. I'm only seeing U.2 and the pci-e full cards. I have a 2 slot pci-e x8 Gen3 card that I can use in 4x4 bifurcation for 2 M.2 NVMe drives.
Look for the P4801X if you're hunting for M.2 form factor drives specifically, but bear in mind that those drives have less capability to dissipate heat versus the U.2 or PCIe format.

I'm personally using the S3500 units, but I only have about 1TB of active load on the SSD.
 
Top