Another slow pool thread.

alexhore · Feb 14, 2022

Build a new TrueNas up:

Spermicro X9scm-f
Intel Xeon Processor CPU SR0P4 E3-1230 v2 8 MB L3 Cache 3.3 GHz 69w LGA1155
32GB 4 x 8GB 2Rx8 PC3-14900E DDR3 1866MHz DIMM MAC PRO 2013
LSI SAS 9201-16E
10 x Seagate Enterprise 4TB 7.2K 128MB 6Gbps 3.5" HDD ST4000NM0023
Corsair MP510, Force Series, 240GB M.2 NVMe PCIe x4

Connected the 10 SAS drives up to the LSI controller in the 8x PCIE3.0 slot and created a single RAIDZ1 Pool
Connected the NVME drives to the second and third PCIE slots, one is gen2 then mirrored for System pool
Suck another SSD in there for boot

Noticed I can saturate Lan downloaded from RAIDZ1 pool but uploading stuck at 30MB/s. Tired another laptop and a PC, both hard wired no dice.

I asumed that writing across 10 drives one way or another would be fast. Quick google and come to the realisation that RAIDZ1 is apparently not fast?

As a test transfered a large file, way bigger than my pool of ram from RAIDZ1 to the mirrored NVMEs and then transfered it back.

SAS -> NVME SAS Read 100MB/s (10x10MB/s) NVME Write 100MB/s (x1 as mirrored)
NVME -> SAS NVME Read 200MB/s (2x100MB/s) SAS Write 200MB/s (10x20MB/s)

Considering the SAS drives should be 170MB/s each i'm guessing this is not normal? Where on earth does 30MB/s come from when uploading via SMB i'm stumped.

root@truenas[~]# pciconf -lvc mps0
mps0@pci0:1:0:0: class=0x010700 card=0x30d01000 chip=0x00641000 rev=0x02 hdr=0x00
vendor = 'Broadcom / LSI'
device = 'SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor]'
class = mass storage
subclass = SAS
cap 01[50] = powerspec 3 supports D0 D1 D2 D3 current D0
cap 10[68] = PCI-Express 2 endpoint max data 256(4096) FLR NS
link x8(x8) speed 5.0(5.0) ASPM disabled(L0s)
cap 03[d0] = VPD
cap 05[a8] = MSI supports 1 message, 64 bit
cap 11[c0] = MSI-X supports 15 messages, enabled
Table in map 0x14[0x2000], PBA in map 0x14[0x3800]
ecap 0001[100] = AER 1 0 fatal 0 non-fatal 5 corrected
ecap 0004[138] = Power Budgeting 1
ecap 0010[150] = SR-IOV 1 IOV disabled, Memory Space disabled, ARI disabled
0 VFs configured out of 7 supported
First VF RID Offset 0x0001, VF RID Stride 0x0001
VF Device ID 0x0064
Page Sizes: 4096 (enabled), 8192, 65536, 262144, 1048576, 4194304
ecap 000e[190] = ARI 1

root@truenas[~]# pciconf -lvc nvme0
nvme0@pci0:2:0:0: class=0x010802 card=0x50121987 chip=0x50121987 rev=0x01 hdr=0x00
vendor = 'Phison Electronics Corporation'
device = 'E12 NVMe Controller'
class = mass storage
subclass = NVM
cap 10[80] = PCI-Express 2 endpoint max data 128(256) FLR NS
link x4(x4) speed 8.0(8.0) ASPM L1(L1)
cap 11[d0] = MSI-X supports 9 messages, enabled
Table in map 0x10[0x2000], PBA in map 0x10[0x3000]
cap 05[e0] = MSI supports 8 messages, 64 bit, vector masks
cap 01[f8] = powerspec 3 supports D0 D3 current D0
ecap 0018[100] = LTR 1
ecap 001e[110] = L1 PM Substates 1
ecap 000e[128] = ARI 1
ecap 0001[200] = AER 2 0 fatal 0 non-fatal 1 corrected
ecap 0019[300] = PCIe Sec 1 lane errors 0xf

root@truenas[~]# pciconf -lvc nvme1
nvme1@pci0:3:0:0: class=0x010802 card=0x50121987 chip=0x50121987 rev=0x01 hdr=0x00
vendor = 'Phison Electronics Corporation'
device = 'E12 NVMe Controller'
class = mass storage
subclass = NVM
cap 10[80] = PCI-Express 2 endpoint max data 128(256) FLR NS
link x4(x4) speed 2.5(8.0) ASPM L1(L1)
cap 11[d0] = MSI-X supports 9 messages, enabled
Table in map 0x10[0x2000], PBA in map 0x10[0x3000]
cap 05[e0] = MSI supports 8 messages, 64 bit, vector masks
cap 01[f8] = powerspec 3 supports D0 D3 current D0
ecap 0018[100] = LTR 1
ecap 001e[110] = L1 PM Substates 1
ecap 000e[128] = ARI 1
ecap 0001[200] = AER 2 0 fatal 0 non-fatal 3 corrected
ecap 0019[300] = PCIe Sec 1 lane errors 0xd

HoneyBadger · Feb 14, 2022

Based on what I see so far in the unedited post:

Hardware so far doesn't look bad for general file services.
Undersized for VM support.
Likely about to need more RAM than LGA1155 supports to drive performance.

Random shots in the dark:
Enable compression
Don't use deduplication
No, an SLOG won't help
An L2ARC might

Please make sure to describe your workload in the expanded version.

Cheers!

alexhore · Feb 14, 2022

Thanks for the initial stab. Workload, its doing next to nothing right now. Store personal files on there and family watches plex content off it during the evening. Stuck with 32GB of Ram but for what i'm using it for can live with it.

I used default settings so its LZ4 showing 1% compression
Deduplication is not enabled
No SLOG installed
No A2ARC

Dug this out something clearly not right I should be somewhere in the middle...

ZFS Raidz Performance, Capacity and Integrity Comparison @ Calomel.org

5x 4TB, raidz1 (raid5), 15.0 TB, w=469MB/s , rw=79MB/s , r=598MB/s
12x 4TB, raidz (raid5), 41.3 TB, w=689MB/s , rw=118MB/s , r=993MB/s

alexhore · Feb 14, 2022

Stumbled across a command elsewhere for enabling writeback cache on these specific drives. Same thing posted here:

Script to enable disk write caches stopped working [SOLVED]

For a couple of years, I've used the script kindly posted by tvsjr that enables the write caches on my SAS drives: for file in /dev/da? /dev/da??; do echo "WCE: 1" | camcontrol modepage $file -m 0x08 -e; done Somewhere along the way as I upgraded my FreeNAS version -- sadly I don't know when...

www.truenas.com

Code:

for drive in {0..9}; do echo "WCE: 1" | camcontrol modepage /dev/da$drive -m 0x08 -6 -e; done

Added to my post init list.

I'm now able to write NVME -> SAS 800MB/s (10x80MB/s)

Unfortunately i'm still stuck with reading ~10MB/s per drive and the Read cache is already enabled...

root@truenas[~]# smartctl -g rcache /dev/da0 | grep Read
Read Cache is: Enabled

HoneyBadger · Feb 15, 2022

Glad to see the write cache issue resolved the ingest speed. For the read testing, try using dd to copy a file to /dev/null rather than use the copy command which could bottleneck on the target.

eg: dd if=/mnt/saspool/folder/file.mkv of=/dev/null bs=1M

You are using a very wide RAIDZ1 by your description (10-drive wide) and the default 128K record size split into 9 data disks means eight of those disks need to seek/read (assuming it's a media file which is already compressed) to return the record - and it could just be an alignment/timing issue. What does your ARC hit% rate look like during a read?

alexhore · Feb 15, 2022

Thanks! getting closer to the issue I think.

127MB/s per drive, so clearly the issue is writing to my mirrored NVME drives. They are being used for System but its doing nothing right now.

Removed one NVME from the pool, created a single disk pool with it.

Both the System pool (now single NVME) and the new pool (single NVME) stuck at 200/250MB/s thats (25MB/s~ per SAS drive, a slight boost perhaps because I also found the enable trim button.

Now i'm wondering if I messed something up creating the System pool, stupidly created a single NVME stripped pool while the NVME was still in the old server then later converted to mirror using Zpool commands in the CLI. Wish they would introduce the convert to mirror function available against the boot pool...

There is also the chance the NVME write speed is OK it's something to do with moving data between pools, a system bottleneck perhaps.

alexhore · Feb 15, 2022

Recreated the NVME System pool after discovering that Status -> Extend option in the GUI converts to a mirror for you, not completely obvious but hey.

Auto trim was probably/maybe a gradual thing that could have improved over time so issued a "zpool trim System"

The below shows two dd attempts back to back then the same file being cp copied to the NVME pool.

NVME drives leveling off about 300MB/s locking the sas drives down to 30MB/s perhaps?

I reversed your dd command, google says random is CPU intensive so used zero

Code:

dd if=/dev/zero of=/mnt/System/ddfile bs=1M

Left side of chart the 300MB/s leveling off, right side the dd command, really struggling CPU only 60%.

Looks like my SAS read problem is actually and NVME write problem not sure what else to try, drive info below...

[COLLAPSE]
root@truenas[~]# smartctl -a /dev/nvme0
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number: Force MP510
Serial Number: 202882180FF128090005
Firmware Version: ECFM12.3
PCI Vendor/Subsystem ID: 0x1987
IEEE OUI Identifier: 0x6479a7
Total NVM Capacity: 240,057,409,536 [240 GB]
Unallocated NVM Capacity: 0
Controller ID: 1
NVMe Version: 1.3
Number of Namespaces: 1
Namespace 1 Size/Capacity: 240,057,409,536 [240 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 6479a7 fff0003035
Local Time is: Tue Feb 15 20:29:57 2022 GMT
Firmware Updates (0x12): 1 Slot, no Reset required
Optional Admin Commands (0x0007): Security Format Frmw_DL
Optional NVM Commands (0x0054): DS_Mngmt Sav/Sel_Feat Timestmp
Log Page Attributes (0x08): Telmtry_Lg
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 75 Celsius
Critical Comp. Temp. Threshold: 80 Celsius

Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 6.80W - - 0 0 0 0 0 0
1 + 5.74W - - 1 1 1 1 0 0
2 + 5.21W - - 2 2 2 2 0 0
3 - 0.0490W - - 3 3 3 3 2000 2000
4 - 0.0018W - - 4 4 4 4 25000 25000

Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 2
1 - 4096 0 1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 40 Celsius
Available Spare: 100%
Available Spare Threshold: 5%
Percentage Used: 5%
Data Units Read: 1,885,497 [965 GB]
Data Units Written: 19,356,573 [9.91 TB]
Host Read Commands: 38,686,273
Host Write Commands: 616,715,641
Controller Busy Time: 628
Power Cycles: 193
Power On Hours: 12,078
Unsafe Shutdowns: 175
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0

Error Information (NVMe Log 0x01, 16 of 63 entries)
No Errors Logged
[/COLLAPSE]

[COLLAPSE]
root@truenas[~]# smartctl -a /dev/nvme1
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number: Force MP510
Serial Number: 192582180FF128090008
Firmware Version: ECFM12.1
PCI Vendor/Subsystem ID: 0x1987
IEEE OUI Identifier: 0x6479a7
Total NVM Capacity: 240,057,409,536 [240 GB]
Unallocated NVM Capacity: 0
Controller ID: 1
NVMe Version: 1.3
Number of Namespaces: 1
Namespace 1 Size/Capacity: 240,057,409,536 [240 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 6479a7 fff0003038
Local Time is: Tue Feb 15 20:31:43 2022 GMT
Firmware Updates (0x12): 1 Slot, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005d): Comp DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x0c): Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 70 Celsius
Critical Comp. Temp. Threshold: 90 Celsius

Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 6.80W - - 0 0 0 0 0 0
1 + 5.74W - - 1 1 1 1 0 0
2 + 5.21W - - 2 2 2 2 0 0
3 - 0.0490W - - 3 3 3 3 2000 2000
4 - 0.0018W - - 4 4 4 4 25000 25000

Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 2
1 - 4096 0 1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 36 Celsius
Available Spare: 100%
Available Spare Threshold: 5%
Percentage Used: 6%
Data Units Read: 2,107,773 [1.07 TB]
Data Units Written: 23,682,517 [12.1 TB]
Host Read Commands: 41,950,552
Host Write Commands: 765,131,221
Controller Busy Time: 924
Power Cycles: 47
Power On Hours: 15,793
Unsafe Shutdowns: 25
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0

Error Information (NVMe Log 0x01, 16 of 63 entries)
No Errors Logged
[/COLLAPSE]

Important Announcement for the TrueNAS Community.

Another slow pool thread.

alexhore

Explorer

HoneyBadger

actually does care

alexhore

Explorer

alexhore

Explorer

Script to enable disk write caches stopped working [SOLVED]

HoneyBadger

actually does care

alexhore

Explorer

alexhore

Explorer

Similar threads