How to find a slow disk

Zappo

Cadet
Joined
Nov 2, 2020
Messages
5
Hello,

I'm new to the forum but not new to ZFS. Have been using Solaris, later Nexenta and Freenas for a long time and now did a fresh install of TrueNAS 12 GA on a Supermicro Server I had lying around (decommissioned Nexenta box, specs below).

I'm having an issue which I never had to troubleshoot before and hence i'm lacking experience in this field. My box has 3 RaidZ2 VDEV's in a Pool and one of the 3 VDEV's is much slower than the other 2, dragging total performance down.

I have 24x 10k RPM SAS2 disks (of the same model) divided over three 6+2 RAIDZ2 VDEV's. Two super fast SLC write-intensive Enterprise grade Flashdrives are used as a mirrored ZIL.
As you can see below, the write throughput of the 3rd RAIDZ2 set is much slower than the other 2. Roughly 129 MB/s while the other two are each writing with around 300 MB/s at the moment of snapping the output of "zpool iostat -v 3".

Code:
                                                  capacity     operations     bandwidth
pool                                            alloc   free   read  write   read  write
----------------------------------------------  -----  -----  -----  -----  -----  -----
Pool01                                          5.82T  7.21T      0  3.98K  1.20K   464M
  raidz2                                        2.43T  1.91T      0    303      0  83.7M
    gptid/199def9f-1c52-11eb-ad4b-a0369f19e510      -      -      0     40      0  10.5M
    gptid/1a91b995-1c52-11eb-ad4b-a0369f19e510      -      -      0     36      0  10.5M
    gptid/1a57856d-1c52-11eb-ad4b-a0369f19e510      -      -      0     37      0  10.5M
    gptid/1a72dc44-1c52-11eb-ad4b-a0369f19e510      -      -      0     39      0  10.5M
    gptid/1a923c8f-1c52-11eb-ad4b-a0369f19e510      -      -      0     37      0  10.5M
    gptid/1ac7ebfc-1c52-11eb-ad4b-a0369f19e510      -      -      0     38      0  10.5M
    gptid/1b12dc45-1c52-11eb-ad4b-a0369f19e510      -      -      0     36      0  10.5M
    gptid/1cdd56f2-1c52-11eb-ad4b-a0369f19e510      -      -      0     38      0  10.5M
  raidz2                                        2.44T  1.90T      0    299    818  72.8M
    gptid/1909a4a5-1c52-11eb-ad4b-a0369f19e510      -      -      0     36      0  9.10M
    gptid/19be4b1d-1c52-11eb-ad4b-a0369f19e510      -      -      0     38      0  9.10M
    gptid/1b5b4c71-1c52-11eb-ad4b-a0369f19e510      -      -      0     39    409  9.10M
    gptid/1b7d57e8-1c52-11eb-ad4b-a0369f19e510      -      -      0     33      0  9.10M
    gptid/1b6c8d69-1c52-11eb-ad4b-a0369f19e510      -      -      0     35      0  9.10M
    gptid/1bf27e07-1c52-11eb-ad4b-a0369f19e510      -      -      0     39    409  9.09M
    gptid/1c3b860c-1c52-11eb-ad4b-a0369f19e510      -      -      0     36      0  9.10M
    gptid/1c7422ff-1c52-11eb-ad4b-a0369f19e510      -      -      0     38      0  9.10M
  raidz2                                         964G  3.40T      0    129    409  27.6M
    gptid/1ca9f7d8-1c52-11eb-ad4b-a0369f19e510      -      -      0     26      0  3.44M
    gptid/1ee056a8-1c52-11eb-ad4b-a0369f19e510      -      -      0     15      0  3.45M
    gptid/1f1d6847-1c52-11eb-ad4b-a0369f19e510      -      -      0     15      0  3.45M
    gptid/1f2a0236-1c52-11eb-ad4b-a0369f19e510      -      -      0     14      0  3.45M
    gptid/1fb46ddb-1c52-11eb-ad4b-a0369f19e510      -      -      0     14    409  3.44M
    gptid/1fc89694-1c52-11eb-ad4b-a0369f19e510      -      -      0     13      0  3.44M
    gptid/1fe0f77d-1c52-11eb-ad4b-a0369f19e510      -      -      0     13      0  3.44M
    gptid/20015769-1c52-11eb-ad4b-a0369f19e510      -      -      0     16      0  3.45M
logs                                                -      -      -      -      -      -
  mirror                                        1.95G  91.1G      0  3.26K      0   280M
    gptid/1d5bded5-1c52-11eb-ad4b-a0369f19e510      -      -      0  1.63K      0   140M
    gptid/1d7f605a-1c52-11eb-ad4b-a0369f19e510      -      -      0  1.63K      0   140M
----------------------------------------------  -----  -----  -----  -----  -----  -----
boot-pool                                       2.13G   354G      0      0      0      0
  mirror                                        2.13G   354G      0      0      0      0
    ada0p2                                          -      -      0      0      0      0
    ada1p2                                          -      -      0      0      0      0


The 3rd RAIDZ2 also holds a lot less data which also puzzles me (VDEV 1 = 2.43T, VDEV 2 = 2.44T, VDEV3 only 964G)
What is going on? I've never encountered such a grave imbalance in both performance and amount of data stored between VDEV's (pool was created in one go with all 24disks). The Pool was created with all 3 VDEV's present and all went well. Performance during the first couple of days was great. The ZIL's ingesting data at around 550MB/s for hours on end (large copy-job). Today, I noticed a collapse of performance, sinking and stabilizing at around 135 MB/s and has been like that ever since.

So maybe the 3rd RAIDZ2 has a slow disk in it. But how do I find it?

The problem is only with writes.
Reads are still fast and can easily saturate a 10gig link for extended periods (hours long benchmarks, stuff like that).

Pool specs: Sync=always, Compression=off, Dedup=off, Atime=off (datasets inherit everything).
Protocol used: NFS 4.1


Kind regards,
Zappo
________________________________________________
System specs:
Xeon E3-1270 3.4 Ghz
32gig RAM
2x LSI SAS9207-8i (firmware P20)
Backplane 1: 24x WD 10k 2.5" Enterprise dual-ported SAS2 drives.
Backplane 2: 2x HGST SLC write-intensive SSD's for mirrored ZIL
Backplane 3: 2x Sata SSD for mirrored Boot
10gig Network connectivity
 
Top