Device /dev/gptid/ is causing slow I/O on pool (CMR Drives, LSI SAS 9300-16I)

rebo00

Dabbler
Joined
May 29, 2020
Messages
24
Hi,

New TrueNAS Core build with hopefully just teething problems.
Was browsing a Windows Share (SMB) and it just stopped responding, I could access the server web GUI ok and noted the below error notification:
"Device /dev/gptid/b0357e0a-0d31-11ee-89b4-9c6b00059457 is causing slow I/O on pool Media.
2023-07-21 18:38:47 (Europe/London)"

Not seen this on my previous server and first time I've seen it on the new server.

Hard drive burn in test was completed following this guide, and after many days no errors were found on the final SMART test.

I've seen other posts with this error with 99% of the issues being SMR drives being used and other posts specifically CMR don't seem to come to any resolution.

OS: TrueNAS-13.0-U5.2
CPU: 13th Gen Intel(R) Core(TM) i5-13500
Motherboard: ASRock Z790 PG Lightning/D4
RAM: 2x16G CorsVengLPX DDR4 3200C16
Onboard NIC: Realtek 2.5Gbps, not in use
Additional NIC: Solarflare SFN6122F SF329-9021-R7
HBA: LSI SAS 9300-16I

Pool1: Media - 10 x 18TB CMR Disk
Pool2 Plugins - 2 x 500GB NVME m.2

Code:
root@truenas[~]# camcontrol devlist
<ATA TOSHIBA MG09ACA1 4004>        at scbus0 target 0 lun 0 (pass0,da0)
<ATA TOSHIBA MG09ACA1 4004>        at scbus0 target 1 lun 0 (pass1,da1)
<ATA TOSHIBA MG09ACA1 4004>        at scbus0 target 2 lun 0 (pass2,da2)
<ATA TOSHIBA MG09ACA1 4004>        at scbus0 target 3 lun 0 (pass3,da3)
<ATA TOSHIBA MG09ACA1 0105>        at scbus0 target 4 lun 0 (pass4,da4)
<ATA TOSHIBA MG09ACA1 4004>        at scbus0 target 5 lun 0 (pass5,da5)
<ATA TOSHIBA MG09ACA1 4004>        at scbus0 target 6 lun 0 (pass6,da6)
<ATA TOSHIBA MG09ACA1 4004>        at scbus0 target 7 lun 0 (pass7,da7)
<ATA TOSHIBA MG09ACA1 4004>        at scbus1 target 4 lun 0 (pass8,da8)
<ATA TOSHIBA MG09ACA1 4004>        at scbus1 target 5 lun 0 (pass9,da9)
<AHCI SGPIO Enclosure 2.00 0001>   at scbus6 target 0 lun 0 (ses0,pass10)



Code:
root@truenas[~]# zpool status
  pool: Media
 state: ONLINE
config:

        NAME                                            STATE     READ WRITE CKSUM
        Media                                           ONLINE       0     0 0
          raidz2-0                                      ONLINE       0     0 0
            gptid/b0357e0a-0d31-11ee-89b4-9c6b00059457  ONLINE       0     0 0
            gptid/afc7d6f1-0d31-11ee-89b4-9c6b00059457  ONLINE       0     0 0
            gptid/b0324641-0d31-11ee-89b4-9c6b00059457  ONLINE       0     0 0
            gptid/b02808cd-0d31-11ee-89b4-9c6b00059457  ONLINE       0     0 0
            gptid/b02daaae-0d31-11ee-89b4-9c6b00059457  ONLINE       0     0 0
            gptid/b03d1135-0d31-11ee-89b4-9c6b00059457  ONLINE       0     0 0
            gptid/afc211dd-0d31-11ee-89b4-9c6b00059457  ONLINE       0     0 0
            gptid/b0393f62-0d31-11ee-89b4-9c6b00059457  ONLINE       0     0 0
            gptid/afa06d7b-0d31-11ee-89b4-9c6b00059457  ONLINE       0     0 0
            gptid/afcd90fc-0d31-11ee-89b4-9c6b00059457  ONLINE       0     0 0

errors: No known data errors

  pool: Plugins
 state: ONLINE
  scan: scrub repaired 0B in 00:00:20 with 0 errors on Sun Jul  9 00:00:20 2023
config:

        NAME                                            STATE     READ WRITE CKSUM
        Plugins                                         ONLINE       0     0 0
          mirror-0                                      ONLINE       0     0 0
            gptid/28dbd92f-00b2-11ee-99ab-9c6b00059457  ONLINE       0     0 0
            gptid/28de1862-00b2-11ee-99ab-9c6b00059457  ONLINE       0     0 0

errors: No known data errors

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:02 with 0 errors on Fri Jul 21 03:45:02 2023
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          nvd0p2    ONLINE       0     0     0

errors: No known data errors



Code:
root@truenas[~]# glabel status
                                      Name  Status  Components
gptid/32185ae9-0235-11ee-9cb9-138f8fd94ae5     N/A  nvd0p1
gptid/28dbd92f-00b2-11ee-99ab-9c6b00059457     N/A  nvd1p2
gptid/28de1862-00b2-11ee-99ab-9c6b00059457     N/A  nvd2p2
gptid/b0357e0a-0d31-11ee-89b4-9c6b00059457     N/A  da0p2
gptid/afc7d6f1-0d31-11ee-89b4-9c6b00059457     N/A  da1p2
gptid/b02808cd-0d31-11ee-89b4-9c6b00059457     N/A  da3p2
gptid/afa06d7b-0d31-11ee-89b4-9c6b00059457     N/A  da8p2
gptid/b0324641-0d31-11ee-89b4-9c6b00059457     N/A  da2p2
gptid/b03d1135-0d31-11ee-89b4-9c6b00059457     N/A  da5p2
gptid/afcd90fc-0d31-11ee-89b4-9c6b00059457     N/A  da9p2
gptid/b0393f62-0d31-11ee-89b4-9c6b00059457     N/A  da7p2
gptid/b02daaae-0d31-11ee-89b4-9c6b00059457     N/A  da4p2
gptid/afc211dd-0d31-11ee-89b4-9c6b00059457     N/A  da6p2
gptid/ae94c21b-0d31-11ee-89b4-9c6b00059457     N/A  da0p1
gptid/321b6143-0235-11ee-9cb9-138f8fd94ae5     N/A  nvd0p3
 
Joined
Jan 7, 2015
Messages
1,155
I have seen this before. I take to just replacing the disk and/or RMA it if new. There is an array testing script around called Solnet Array Test would expose under performing disks. Truenas is essentially telling you this is disk da0 is slowing your pool down.
 

rebo00

Dabbler
Joined
May 29, 2020
Messages
24
I have seen this before. I take to just replacing the disk and/or RMA it if new. There is an array testing script around called Solnet Array Test would expose under performing disks. Truenas is essentially telling you this is disk da0 is slowing your pool down.
Thanks John,

Running it now and added to my list of testing.

Would the burn-in testing I mentioned above not trigger the alert?

Also wondering if maybe the HBA card was running hot, I presume it would throttle down but I'd expect that to alert on all disks.

Regards,

Andy
 

rebo00

Dabbler
Joined
May 29, 2020
Messages
24
Ran one pass but the web gui timed out so I didn't see any results, but it didn't generate an Alert again.
Running it again on all drives drives from the actual Server.

Regards,

Andy
 
Joined
Jan 7, 2015
Messages
1,155
I'm not sure if your other burn-in would expose this or not mainly a badblocks combined with smart testing? This is more of a script for this purpose to see what i/o speeds your array and each disk is capable of. I think it will tell you what Truenas already has, that da0 is abnormally slow compared to the other disks. Now if that is the disk, or some other environment problem you'll have to figure it out. Replacing the disk in question fixed the issue for me, haven't seen it since.
 

rebo00

Dabbler
Joined
May 29, 2020
Messages
24
I'm not sure if your other burn-in would expose this or not mainly a badblocks combined with smart testing? This is more of a script for this purpose to see what i/o speeds your array and each disk is capable of. I think it will tell you what Truenas already has, that da0 is abnormally slow compared to the other disks. Now if that is the disk, or some other environment problem you'll have to figure it out. Replacing the disk in question fixed the issue for me, haven't seen it since.
DA0 seems fine, I've also RMA'd DA0 once already because I didn't like how it sounded

1690319719622.jpeg

Regards,

Andy
 
Joined
Jan 7, 2015
Messages
1,155
Interesting, keep us posted.
 

rebo00

Dabbler
Joined
May 29, 2020
Messages
24
I can't remember why now but I had to rerun the test, again I select all disks and one pass.

Parallel array read seems fine, 99-100% speed compared to serial.

Drop to ~65% during the parallel seek-stress array read which I guess is expected as it tries to do multiple seeks at once.

Completion is 27hours from 11:15 (BST) so around 14:30 tomorrow, will update if it finishes by then.

PXL_20230730_172012145.MP.jpg


Regards,

Andy
 

rebo00

Dabbler
Joined
May 29, 2020
Messages
24
Since running these I've installed some additional fans and I've not seen the issue reporting in TrueNAS since. I'm still trying to replicate the issue but I think it was a cooling issue on the LSI SAS 9300-16I which should be resolved with the additional fans.

Regards,

Andy
 
Top