All Disks in VDEV Faulted all at once? But no other drives on Backplane?

im.thatoneguy

Dabbler
Joined
Nov 4, 2022
Messages
37
So I have a pool of 4VDEVs with 7 drives each in RAIDZ2.

2 of the drives suddenly were marked Faulted yesterday. But all of the drives are reporting Read/write Errors. It seems really strange for two disks to simultaneously fail. And even stranger that all 7 drives in VDEV reported R/W errors but none of the other drives in any of the other VDEVs.

21 other disks on the same HBA/backplane connected using a single 8x minisas and none errored so it doesn't seem like an issue with the HBA or backplane. All the errors are in this one VDEV.

Code:
Full ZFS Status
 pool: Poolio
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
  scan: scrub in progress since Wed May 17 13:58:05 2023
        22.0T scanned at 6.19G/s, 4.66T issued at 1.31G/s, 251T total
        0B repaired, 1.86% done, 2 days 05:26:53 to go
config:
        NAME                                      STATE     READ WRITE CKSUM
       Poolio                                   DEGRADED     0     0     0
          raidz2-0                                ONLINE       0     0     0
            acf34ef7-f12f-495f-9868-a374d86a2648  ONLINE       0     0     0
            db1c6594-cd2f-454b-9419-210731e65be0  ONLINE       0     0     0
            6f44012b-0e59-4112-a80c-4a77c588fb47  ONLINE       0     0     0
            67c4a45d-9ec2-4e74-8e79-918736e88ea9  ONLINE       0     0     0
            95d6603d-cb13-4163-9c51-af488936ea25  ONLINE       0     0     0
            c50fdb2a-3444-41f1-a4fe-2cd9bd453fc9  ONLINE       0     0     0
            9e77ad26-3db9-4665-b595-c5b55dc1afc5  ONLINE       0     0     0
          raidz2-1                                ONLINE       0     0     0
            0cfe57fd-446a-47c9-b405-f98472c77254  ONLINE       0     0     0
            1ab0c8ba-245c-499c-9bc7-aa88119d21c2  ONLINE       0     0     0
            a814a4b8-92bc-42b9-9699-29133bf58fbf  ONLINE       0     0     0
            ca62c03c-4515-409d-bbba-fc81823b9d1b  ONLINE       0     0     0
            a414e34d-0a6b-40b0-923e-f3b7be63d99e  ONLINE       0     0     0
            390d360f-34e9-41e0-974c-a45e86d6e5c5  ONLINE       0     0     0
            28cf8f48-b201-4602-9667-3890317a98ba  ONLINE       0     0     0
          raidz2-2                                DEGRADED 1.13K     8     0
            68c02eb0-9ddd-4af3-b010-6b0da2e79a8f  DEGRADED   185     4     0  too many errors
            904f837f-0c13-453f-a1e7-81901c9ac05c  DEGRADED   219     4     0  too many errors
            20d31e9b-1136-44d9-b17e-d88ab1c2450b  FAULTED    169     2     0  too many errors
            5f6d8664-c2b6-4214-a78f-b17fe4f35b57  DEGRADED   114     2     0  too many errors
            4337a24c-375b-4e4f-8d1d-c4d33a7f5c5c  DEGRADED   297     4     0  too many errors
            ec890270-6644-409e-b076-712ccdb666f7  FAULTED    103     0     0  too many errors
            03704d2e-7555-4d2f-8d51-db97b02a7827  DEGRADED   408     8     0  too many errors
          raidz2-3                                ONLINE       0     0     0
            4454bfc4-f3b5-40ad-9a75-ff53c4d3cc15  ONLINE       0     0     0
            705e7dbb-1fd2-4cef-9d64-40f4fa50aafb  ONLINE       0     0     0
            c138c2f3-8fc3-4238-b0a8-998869392dde  ONLINE       0     0     0
            8e4672ab-a3f0-4fa9-8839-dd36a727348b  ONLINE       0     0     0
            37a34809-ad1a-4c7b-a4eb-464bf2b16dae  ONLINE       0     0     0
            a497afec-a002-47a9-89ff-1d5ecdd5035d  ONLINE       0     0     0
            21a5e250-e204-4cb6-8ac7-9cda0b69c965  ONLINE       0     0     0
        cache
          a1c57168-e139-4052-87c2-8737c6f74e8a    ONLINE       0     0     0
          8d48eae7-4285-4f74-962e-7db7491355fa    ONLINE       0     0     0

errors: No known data errors


I don't see anything in SMART data that would suggest those two drives that it seemed to pick at random are the actual problems. One of the two FAULTED drives is one of the only drives without a write error.

Good news, no data errors... yet.

dev/sdac is one of the "FAULTED" drives. And weirdly it looks like it started a couple days ago and my email notifications were just broken until now. But I'm a newb at logs... dev/sdy is the other drive it reported as faulted but it's not even in the logs for an error at all. (And it had the least errors of all of the drives in the zpool status). So I'm confused why ZFS decided it was FAULTED while another drive had 4x as many read/write errors

Code:
May 14 17:35:15 SERVER kernel: blk_update_request: I/O error, dev sdac, sector 16329244720 op 0x0:(READ) flags 0x700 phys_seg 83 prio class 0
May 14 17:35:15 SERVER kernel: blk_update_request: I/O error, dev sdac, sector 16329242688 op 0x0:(READ) flags 0x700 phys_seg 114 prio class 0
May 14 17:35:15 SERVER kernel: blk_update_request: I/O error, dev sdac, sector 16329246384 op 0x0:(READ) flags 0x700 phys_seg 126 prio class 0

May 16 11:15:00 SERVER kernel: blk_update_request: I/O error, dev sdy, sector 15889534048 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
May 16 11:15:00 SERVER kernel: blk_update_request: I/O error, dev sdy, sector 15889544072 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
May 16 11:15:00 SERVER kernel: blk_update_request: I/O error, dev sdy, sector 15889534160 op 0x0:(READ) flags 0x700 phys_seg 2 prio class 0
May 16 11:15:00 SERVER kernel: blk_update_request: I/O error, dev sdy, sector 15979235688 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
May 16 11:15:00 SERVER kernel: blk_update_request: I/O error, dev sdy, sector 15970654656 op 0x0:(READ) flags 0x700 phys_seg 2 prio class 0
May 16 11:15:00 SERVER kernel: blk_update_request: I/O error, dev sdy, sector 15971153888 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
May 16 11:15:00 SERVER kernel: blk_update_request: I/O error, dev sdy, sector 14258487064 op 0x0:(READ) flags 0x700 phys_seg 126 prio class 0
May 16 11:15:00 SERVER kernel: blk_update_request: I/O error, dev sdy, sector 14181792576 op 0x0:(READ) flags 0x700 phys_seg 123 prio class 0
May 16 11:15:00 SERVER kernel: blk_update_request: I/O error, dev sdy, sector 15889533568 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
May 16 11:15:00 SERVER kernel: blk_update_request: I/O error, dev sdy, sector 14301338720 op 0x0:(READ) flags 0x700 phys_seg 123 prio class 0

May 16 12:09:36 SERVER kernel: blk_update_request: I/O error, dev sdq, sector 16141863408 op 0x1:(WRITE) flags 0x700 phys_seg 4 prio class 0
May 16 12:09:36 SERVER kernel: blk_update_request: I/O error, dev sdq, sector 16132242856 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
May 16 12:09:36 SERVER kernel: blk_update_request: I/O error, dev sdq, sector 16086185648 op 0x0:(READ) flags 0x700 phys_seg 26 prio class 0
May 16 12:09:36 SERVER kernel: blk_update_request: I/O error, dev sdq, sector 16091533752 op 0x0:(READ) flags 0x700 phys_seg 7 prio class 0
May 16 12:09:36 SERVER kernel: blk_update_request: I/O error, dev sdq, sector 16103682208 op 0x1:(WRITE) flags 0x700 phys_seg 7 prio class 0
May 16 12:09:36 SERVER kernel: blk_update_request: I/O error, dev sdq, sector 15812198816 op 0x0:(READ) flags 0x700 phys_seg 111 prio class 0
May 16 12:09:36 SERVER kernel: blk_update_request: I/O error, dev sdq, sector 15039759112 op 0x0:(READ) flags 0x700 phys_seg 87 prio class 0
May 16 12:09:36 SERVER kernel: blk_update_request: I/O error, dev sdq, sector 15840024312 op 0x0:(READ) flags 0x700 phys_seg 20 prio class 0
May 16 12:09:36 SERVER kernel: blk_update_request: I/O error, dev sdq, sector 5686139760 op 0x0:(READ) flags 0x700 phys_seg 72 prio class 0
May 16 12:09:36 SERVER kernel: blk_update_request: I/O error, dev sdq, sector 30347783096 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0

May 16 12:24:59 SERVER kernel: blk_update_request: I/O error, dev sdq, sector 16129087712 op 0x0:(READ) flags 0x700 phys_seg 28 prio class 0
May 16 12:24:59 SERVER kernel: blk_update_request: I/O error, dev sdq, sector 16128914496 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
May 16 12:24:59 SERVER kernel: blk_update_request: I/O error, dev sdq, sector 16128920792 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
May 16 12:24:59 SERVER kernel: blk_update_request: I/O error, dev sdq, sector 15898154400 op 0x0:(READ) flags 0x700 phys_seg 107 prio class 0
May 16 12:24:59 SERVER kernel: blk_update_request: I/O error, dev sdq, sector 15883729120 op 0x0:(READ) flags 0x700 phys_seg 13 prio class 0
May 16 12:24:59 SERVER kernel: blk_update_request: I/O error, dev sdq, sector 16066709600 op 0x0:(READ) flags 0x700 phys_seg 2 prio class 0
May 16 12:24:59 SERVER kernel: blk_update_request: I/O error, dev sdq, sector 16125574408 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
May 16 12:24:59 SERVER kernel: blk_update_request: I/O error, dev sdq, sector 15704211400 op 0x0:(READ) flags 0x700 phys_seg 83 prio class 0
May 16 12:24:59 SERVER kernel: blk_update_request: I/O error, dev sdq, sector 15883729336 op 0x0:(READ) flags 0x700 phys_seg 7 prio class 0

May 16 12:41:20 SERVER kernel: blk_update_request: I/O error, dev sdaa, sector 16125577720 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
May 16 12:41:20 SERVER kernel: blk_update_request: I/O error, dev sdaa, sector 16500033272 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
May 16 12:41:20 SERVER kernel: blk_update_request: I/O error, dev sdaa, sector 15638449376 op 0x0:(READ) flags 0x700 phys_seg 87 prio class 0
May 16 12:41:20 SERVER kernel: blk_update_request: I/O error, dev sdaa, sector 15988544080 op 0x0:(READ) flags 0x700 phys_seg 94 prio class 0
May 16 12:41:20 SERVER kernel: blk_update_request: I/O error, dev sdaa, sector 16503061064 op 0x1:(WRITE) flags 0x700 phys_seg 26 prio class 0
May 16 12:41:20 SERVER kernel: blk_update_request: I/O error, dev sdaa, sector 15653095928 op 0x0:(READ) flags 0x700 phys_seg 7 prio class 0
May 16 12:41:20 SERVER kernel: blk_update_request: I/O error, dev sdaa, sector 15988545776 op 0x0:(READ) flags 0x700 phys_seg 16 prio class 0
May 16 12:41:20 SERVER kernel: blk_update_request: I/O error, dev sdaa, sector 15049485504 op 0x0:(READ) flags 0x700 phys_seg 117 prio class 0
May 16 12:41:20 SERVER kernel: blk_update_request: I/O error, dev sdaa, sector 15988545832 op 0x0:(READ) flags 0x700 phys_seg 7 prio class 0

May 16 12:48:25 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 16523058512 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
May 16 12:48:25 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 16523058528 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
May 16 12:48:25 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 16004334320 op 0x0:(READ) flags 0x700 phys_seg 2 prio class 0
May 16 12:48:25 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 15308686376 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
May 16 12:48:25 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 15308685552 op 0x0:(READ) flags 0x700 phys_seg 43 prio class 0
May 16 12:48:25 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 15308683520 op 0x0:(READ) flags 0x700 phys_seg 107 prio class 0
May 16 12:48:25 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 14115548680 op 0x0:(READ) flags 0x700 phys_seg 26 prio class 0
May 16 12:48:25 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 14842675464 op 0x0:(READ) flags 0x700 phys_seg 124 prio class 0
May 16 12:48:25 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 15860176576 op 0x0:(READ) flags 0x700 phys_seg 113 prio class 0
May 16 12:48:25 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 15221176296 op 0x0:(READ) flags 0x700 phys_seg 110 prio class 0

May 16 12:58:26 SERVER kernel: blk_update_request: I/O error, dev sdaa, sector 16453230912 op 0x1:(WRITE) flags 0x700 phys_seg 26 prio class 0
May 16 12:58:26 SERVER kernel: blk_update_request: I/O error, dev sdaa, sector 16453230712 op 0x1:(WRITE) flags 0x700 phys_seg 13 prio class 0
May 16 12:58:26 SERVER kernel: blk_update_request: I/O error, dev sdaa, sector 14542071760 op 0x0:(READ) flags 0x700 phys_seg 6 prio class 0
May 16 12:58:26 SERVER kernel: blk_update_request: I/O error, dev sdaa, sector 14542070560 op 0x0:(READ) flags 0x700 phys_seg 67 prio class 0
May 16 12:58:26 SERVER kernel: blk_update_request: I/O error, dev sdaa, sector 4303269296 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
May 16 12:58:26 SERVER kernel: blk_update_request: I/O error, dev sdaa, sector 15514354896 op 0x0:(READ) flags 0x700 phys_seg 112 prio class 0
May 16 12:58:26 SERVER kernel: blk_update_request: I/O error, dev sdaa, sector 15120899568 op 0x0:(READ) flags 0x700 phys_seg 119 prio class 0
May 16 12:58:26 SERVER kernel: blk_update_request: I/O error, dev sdaa, sector 14532293056 op 0x0:(READ) flags 0x700 phys_seg 23 prio class 0

May 16 13:01:11 SERVER kernel: blk_update_request: I/O error, dev sdad, sector 246998200 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
May 16 13:01:11 SERVER kernel: blk_update_request: I/O error, dev sdad, sector 16523733160 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
May 16 13:01:12 SERVER kernel: blk_update_request: I/O error, dev sdad, sector 16523047176 op 0x1:(WRITE) flags 0x700 phys_seg 7 prio class 0
May 16 13:01:12 SERVER kernel: blk_update_request: I/O error, dev sdad, sector 15672752792 op 0x0:(READ) flags 0x700 phys_seg 21 prio class 0
May 16 13:01:12 SERVER kernel: blk_update_request: I/O error, dev sdad, sector 15672753176 op 0x0:(READ) flags 0x700 phys_seg 2 prio class 0
May 16 13:01:12 SERVER kernel: blk_update_request: I/O error, dev sdad, sector 15672753288 op 0x0:(READ) flags 0x700 phys_seg 19 prio class 0
May 16 13:01:12 SERVER kernel: blk_update_request: I/O error, dev sdad, sector 16507952136 op 0x0:(READ) flags 0x700 phys_seg 91 prio class 0
May 16 13:01:12 SERVER kernel: blk_update_request: I/O error, dev sdad, sector 14635088448 op 0x0:(READ) flags 0x700 phys_seg 108 prio class 0
May 16 13:01:12 SERVER kernel: blk_update_request: I/O error, dev sdad, sector 15108801064 op 0x0:(READ) flags 0x700 phys_seg 51 prio class 0
May 16 13:01:12 SERVER kernel: blk_update_request: I/O error, dev sdad, sector 14726504232 op 0x0:(READ) flags 0x700 phys_seg 93 prio class 0

May 16 13:05:28 SERVER kernel: blk_update_request: I/O error, dev sdz, sector 18656640 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
May 16 13:05:28 SERVER kernel: blk_update_request: I/O error, dev sdz, sector 18629544 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
May 16 13:05:28 SERVER kernel: blk_update_request: I/O error, dev sdz, sector 18663720 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
May 16 13:05:28 SERVER kernel: blk_update_request: I/O error, dev sdz, sector 17712410784 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
May 16 13:05:28 SERVER kernel: blk_update_request: I/O error, dev sdz, sector 15916423672 op 0x0:(READ) flags 0x700 phys_seg 6 prio class 0
May 16 13:05:28 SERVER kernel: blk_update_request: I/O error, dev sdz, sector 15916422792 op 0x0:(READ) flags 0x700 phys_seg 46 prio class 0
May 16 13:05:28 SERVER kernel: blk_update_request: I/O error, dev sdz, sector 12523793008 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
May 16 13:05:28 SERVER kernel: blk_update_request: I/O error, dev sdz, sector 16523067784 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
May 16 13:05:28 SERVER kernel: blk_update_request: I/O error, dev sdz, sector 15916423832 op 0x0:(READ) flags 0x700 phys_seg 7 prio class 0
May 16 13:05:28 SERVER kernel: blk_update_request: I/O error, dev sdz, sector 15846016624 op 0x0:(READ) flags 0x700 phys_seg 114 prio class 0
May 16 13:11:22 SERVER kernel: blk_update_request: I/O error, dev sdz, sector 16402744408 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
May 16 13:11:22 SERVER kernel: blk_update_request: I/O error, dev sdz, sector 16452521272 op 0x1:(WRITE) flags 0x700 phys_seg 26 prio class 0
May 16 13:11:22 SERVER kernel: blk_update_request: I/O error, dev sdz, sector 15063795464 op 0x0:(READ) flags 0x700 phys_seg 112 prio class 0
May 16 13:11:22 SERVER kernel: blk_update_request: I/O error, dev sdz, sector 15063793440 op 0x0:(READ) flags 0x700 phys_seg 125 prio class 0
May 16 13:11:22 SERVER kernel: blk_update_request: I/O error, dev sdz, sector 30224769768 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
May 16 13:11:22 SERVER kernel: blk_update_request: I/O error, dev sdz, sector 14944089232 op 0x0:(READ) flags 0x700 phys_seg 2 prio class 0
May 16 13:11:22 SERVER kernel: blk_update_request: I/O error, dev sdz, sector 15556958312 op 0x0:(READ) flags 0x700 phys_seg 122 prio class 0
May 16 13:11:22 SERVER kernel: blk_update_request: I/O error, dev sdz, sector 16079798680 op 0x0:(READ) flags 0x700 phys_seg 119 prio class 0

May 16 13:13:43 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 16524896984 op 0x1:(WRITE) flags 0x700 phys_seg 2 prio class 0
May 16 13:13:43 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 223512720 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
May 16 13:13:43 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 16524896976 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
May 16 13:13:43 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 14717936552 op 0x0:(READ) flags 0x700 phys_seg 115 prio class 0
May 16 13:13:43 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 15936469040 op 0x0:(READ) flags 0x700 phys_seg 35 prio class 0
May 16 13:13:43 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 14536612480 op 0x0:(READ) flags 0x700 phys_seg 68 prio class 0
May 16 13:13:43 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 15351605624 op 0x0:(READ) flags 0x700 phys_seg 44 prio class 0
May 16 13:27:59 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 16266149504 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
May 16 13:27:59 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 16266149496 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
May 16 13:28:00 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 11755737840 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
May 16 13:28:00 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 14781656968 op 0x0:(READ) flags 0x700 phys_seg 124 prio class 0
May 16 13:28:00 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 14681192736 op 0x0:(READ) flags 0x700 phys_seg 118 prio class 0
May 16 13:28:00 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 14891633560 op 0x0:(READ) flags 0x700 phys_seg 20 prio class 0
May 16 13:28:00 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 14891633832 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
May 16 13:28:00 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 14891632352 op 0x0:(READ) flags 0x700 phys_seg 67 prio class 0
May 16 13:28:00 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 14939260112 op 0x0:(READ) flags 0x700 phys_seg 112 prio class 0
May 16 13:28:00 SERVER kernel: blk_update_request: I/O error, dev sdab, sector 16402787984 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0

OS Version:TrueNAS-SCALE-22.12.2
Product:SSG-540P-E1CTR45L
Model:Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz
Memory:251 GiB
HBA: AOC-S3808L-L8IT-P
Cable: Slimline x8 (STR) to 2x Slimline x4 (STR),60/60CM,100 O
Backplane: BPN-SAS3-946LEL1 1 45-port 4U SC946L Top-load SAS3 12Gbps expander backplane, support up to 45x 3.5-inch SAS3/SATA3 HDD/SSD
 
Last edited:

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
When a drive spits out too many errors, the system might make the call to consider it faulted.

Can you tell us about the drives model and your PSU(s)?

First thing I would do is make sure to have backup available, check the power and data connections and then zpool clear Poolio and zpool scrub Poolio to see if the errors come out again.
If they do, either the backplane, the HBA, or the cables are faulted and you have to hunt the bad part.

Edit: it appears that a scrub is already in progress, either stop it or let it finish before doing anything.
 
Last edited:

im.thatoneguy

Dabbler
Joined
Nov 4, 2022
Messages
37
Scrub in progress. Should take a couple days.

Also, I'm trying to run LSIGet at the recommendation of Supermicro.

When I run LSIGetlinux_101222.sh:
Capture_Script_Version_101222
OS_LSI is not set, either not a valid OS for this script or...

Is it possible to run the LSI utilities to check for HBA errors on the HBA side of things?

(Supermicro says they don't support ZFS and blames it on a possible software issue. And in this instance I'm sort of inclined to agree with them since VDEV is a software not hardware logical unit. And it's seeming to be limited to the single VDEV.)
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
A possibile cause could also be the HBA overheating.

I don't see why ZFS would have issues with that particular model but I'm no expert of HBAs.

I hardly believe it's a software bug at play here.
 

im.thatoneguy

Dabbler
Joined
Nov 4, 2022
Messages
37
So a realized a couple things. The Scrub was just finishing when it errored. I thought it had finished on Sunday not just started on Sunday.

New Scrub did find parity errors and supposedly corrected them (and is about half done). So there were data integrity issues.

I dug into the logs and it looks like there would be a string of errors, then it would flag a device reset\power on. Which does smell a lot like an HBA\Cable issue. Maybe under peak load from the scrub.


Code:
May 16 12:48:24 sfs-truenas kernel: sd 0:0:25:0: attempting task abort!scmd(0x000000006d2cf258), outstanding for 30136 ms & timeout 30000 ms
May 16 12:48:24 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6743 CDB: Read(16) 88 00 00 00 00 03 90 78 04 28 00 00 00 30 00 00
May 16 12:48:24 sfs-truenas kernel: scsi target0:0:25: handle(0x0028), sas_address(0x500304802168b0c3), phy(3)
May 16 12:48:25 sfs-truenas kernel: scsi target0:0:25: enclosure logical id(0x500304802168b0ff), slot(26)
May 16 12:48:25 sfs-truenas kernel: scsi target0:0:25: enclosure level(0x0001), connector name( C0.1)
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6527 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK cmd_age=29s
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6527 CDB: Write(16) 8a 00 00 00 00 03 d8 d9 dd 50 00 00 00 08 00 00
May 16 12:48:25 sfs-truenas kernel: blk_update_request: I/O error, dev sdab, sector 16523058512 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
May 16 12:48:25 sfs-truenas kernel: zio pool=SFS-ZFS vdev=/dev/disk/by-partuuid/03704d2e-7555-4d2f-8d51-db97b02a7827 error=5 type=2 offset=8457658408960 size=4096 flags=180880
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6532 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK cmd_age=29s
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6532 CDB: Write(16) 8a 00 00 00 00 03 d8 d9 dd 60 00 00 00 08 00 00
May 16 12:48:25 sfs-truenas kernel: blk_update_request: I/O error, dev sdab, sector 16523058528 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
May 16 12:48:25 sfs-truenas kernel: zio pool=SFS-ZFS vdev=/dev/disk/by-partuuid/03704d2e-7555-4d2f-8d51-db97b02a7827 error=5 type=2 offset=8457658417152 size=4096 flags=180880
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6669 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK cmd_age=21s
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6669 CDB: Read(16) 88 00 00 00 00 03 b9 ee c2 f0 00 00 00 d0 00 00
May 16 12:48:25 sfs-truenas kernel: blk_update_request: I/O error, dev sdab, sector 16004334320 op 0x0:(READ) flags 0x700 phys_seg 2 prio class 0
May 16 12:48:25 sfs-truenas kernel: zio pool=SFS-ZFS vdev=/dev/disk/by-partuuid/03704d2e-7555-4d2f-8d51-db97b02a7827 error=5 type=1 offset=8192071622656 size=106496 flags=180880
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: task abort: SUCCESS scmd(0x000000006d2cf258)
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6743 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=30s
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6743 CDB: Read(16) 88 00 00 00 00 03 90 78 04 28 00 00 00 30 00 00
May 16 12:48:25 sfs-truenas kernel: blk_update_request: I/O error, dev sdab, sector 15308686376 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
May 16 12:48:25 sfs-truenas kernel: zio pool=SFS-ZFS vdev=/dev/disk/by-partuuid/03704d2e-7555-4d2f-8d51-db97b02a7827 error=5 type=1 offset=7835899875328 size=24576 flags=180980
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: attempting task abort!scmd(0x00000000717eddca), outstanding for 30652 ms & timeout 30000 ms
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6737 CDB: Read(16) 88 00 00 00 00 03 90 78 00 f0 00 00 02 f8 00 00
May 16 12:48:25 sfs-truenas kernel: scsi target0:0:25: handle(0x0028), sas_address(0x500304802168b0c3), phy(3)
May 16 12:48:25 sfs-truenas kernel: scsi target0:0:25: enclosure logical id(0x500304802168b0ff), slot(26)
May 16 12:48:25 sfs-truenas kernel: scsi target0:0:25: enclosure level(0x0001), connector name( C0.1)
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: No reference found at driver, assuming scmd(0x00000000717eddca) might have completed
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: task abort: SUCCESS scmd(0x00000000717eddca)
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6737 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=30s
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6737 CDB: Read(16) 88 00 00 00 00 03 90 78 00 f0 00 00 02 f8 00 00
May 16 12:48:25 sfs-truenas kernel: blk_update_request: I/O error, dev sdab, sector 15308685552 op 0x0:(READ) flags 0x700 phys_seg 43 prio class 0
May 16 12:48:25 sfs-truenas kernel: zio pool=SFS-ZFS vdev=/dev/disk/by-partuuid/03704d2e-7555-4d2f-8d51-db97b02a7827 error=5 type=1 offset=7835899453440 size=389120 flags=40080c80
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: attempting task abort!scmd(0x00000000d9dcc7ba), outstanding for 30676 ms & timeout 30000 ms
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6815 CDB: Read(16) 88 00 00 00 00 03 90 77 f9 00 00 00 07 e8 00 00
May 16 12:48:25 sfs-truenas kernel: scsi target0:0:25: handle(0x0028), sas_address(0x500304802168b0c3), phy(3)
May 16 12:48:25 sfs-truenas kernel: scsi target0:0:25: enclosure logical id(0x500304802168b0ff), slot(26)
May 16 12:48:25 sfs-truenas kernel: scsi target0:0:25: enclosure level(0x0001), connector name( C0.1)
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: No reference found at driver, assuming scmd(0x00000000d9dcc7ba) might have completed
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: task abort: SUCCESS scmd(0x00000000d9dcc7ba)
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6815 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=30s
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6815 CDB: Read(16) 88 00 00 00 00 03 90 77 f9 00 00 00 07 e8 00 00
May 16 12:48:25 sfs-truenas kernel: blk_update_request: I/O error, dev sdab, sector 15308683520 op 0x0:(READ) flags 0x700 phys_seg 107 prio class 0
May 16 12:48:25 sfs-truenas kernel: zio pool=SFS-ZFS vdev=/dev/disk/by-partuuid/03704d2e-7555-4d2f-8d51-db97b02a7827 error=5 type=1 offset=7835898413056 size=1036288 flags=40080c80
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: attempting task abort!scmd(0x00000000007a576c), outstanding for 30536 ms & timeout 30000 ms
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6518 CDB: Read(16) 88 00 00 00 00 03 49 5a 2e 08 00 00 00 d0 00 00
May 16 12:48:25 sfs-truenas kernel: scsi target0:0:25: handle(0x0028), sas_address(0x500304802168b0c3), phy(3)
May 16 12:48:25 sfs-truenas kernel: scsi target0:0:25: enclosure logical id(0x500304802168b0ff), slot(26)
May 16 12:48:25 sfs-truenas kernel: scsi target0:0:25: enclosure level(0x0001), connector name( C0.1)
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: No reference found at driver, assuming scmd(0x00000000007a576c) might have completed
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: task abort: SUCCESS scmd(0x00000000007a576c)
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6518 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=30s
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6518 CDB: Read(16) 88 00 00 00 00 03 49 5a 2e 08 00 00 00 d0 00 00
May 16 12:48:25 sfs-truenas kernel: blk_update_request: I/O error, dev sdab, sector 14115548680 op 0x0:(READ) flags 0x700 phys_seg 26 prio class 0
May 16 12:48:25 sfs-truenas kernel: zio pool=SFS-ZFS vdev=/dev/disk/by-partuuid/03704d2e-7555-4d2f-8d51-db97b02a7827 error=5 type=1 offset=7225013374976 size=106496 flags=180980
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: attempting task abort!scmd(0x00000000325c0abb), outstanding for 30460 ms & timeout 30000 ms
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6734 CDB: Read(16) 88 00 00 00 00 03 74 b1 41 08 00 00 07 f0 00 00
May 16 12:48:25 sfs-truenas kernel: scsi target0:0:25: handle(0x0028), sas_address(0x500304802168b0c3), phy(3)
May 16 12:48:25 sfs-truenas kernel: scsi target0:0:25: enclosure logical id(0x500304802168b0ff), slot(26)
May 16 12:48:25 sfs-truenas kernel: scsi target0:0:25: enclosure level(0x0001), connector name( C0.1)
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: No reference found at driver, assuming scmd(0x00000000325c0abb) might have completed
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: task abort: SUCCESS scmd(0x00000000325c0abb)
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6734 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=30s
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6734 CDB: Read(16) 88 00 00 00 00 03 74 b1 41 08 00 00 07 f0 00 00
May 16 12:48:25 sfs-truenas kernel: blk_update_request: I/O error, dev sdab, sector 14842675464 op 0x0:(READ) flags 0x700 phys_seg 124 prio class 0
May 16 12:48:25 sfs-truenas kernel: zio pool=SFS-ZFS vdev=/dev/disk/by-partuuid/03704d2e-7555-4d2f-8d51-db97b02a7827 error=5 type=1 offset=7597302288384 size=1040384 flags=40080c80
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: attempting task abort!scmd(0x00000000ac8d9e08), outstanding for 30436 ms & timeout 30000 ms
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6703 CDB: Read(16) 88 00 00 00 00 03 b1 57 16 c0 00 00 07 e8 00 00
May 16 12:48:25 sfs-truenas kernel: scsi target0:0:25: handle(0x0028), sas_address(0x500304802168b0c3), phy(3)
May 16 12:48:25 sfs-truenas kernel: scsi target0:0:25: enclosure logical id(0x500304802168b0ff), slot(26)
May 16 12:48:25 sfs-truenas kernel: scsi target0:0:25: enclosure level(0x0001), connector name( C0.1)
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: No reference found at driver, assuming scmd(0x00000000ac8d9e08) might have completed
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: task abort: SUCCESS scmd(0x00000000ac8d9e08)
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6703 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=30s
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6703 CDB: Read(16) 88 00 00 00 00 03 b1 57 16 c0 00 00 07 e8 00 00
May 16 12:48:25 sfs-truenas kernel: blk_update_request: I/O error, dev sdab, sector 15860176576 op 0x0:(READ) flags 0x700 phys_seg 113 prio class 0
May 16 12:48:25 sfs-truenas kernel: zio pool=SFS-ZFS vdev=/dev/disk/by-partuuid/03704d2e-7555-4d2f-8d51-db97b02a7827 error=5 type=1 offset=8118262857728 size=1036288 flags=40080c80
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: attempting task abort!scmd(0x0000000095bb3842), outstanding for 30408 ms & timeout 30000 ms
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6667 CDB: Read(16) 88 00 00 00 00 03 8b 40 b7 e8 00 00 07 f0 00 00
May 16 12:48:25 sfs-truenas kernel: scsi target0:0:25: handle(0x0028), sas_address(0x500304802168b0c3), phy(3)
May 16 12:48:25 sfs-truenas kernel: scsi target0:0:25: enclosure logical id(0x500304802168b0ff), slot(26)
May 16 12:48:25 sfs-truenas kernel: scsi target0:0:25: enclosure level(0x0001), connector name( C0.1)
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: No reference found at driver, assuming scmd(0x0000000095bb3842) might have completed
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: task abort: SUCCESS scmd(0x0000000095bb3842)
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6667 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=30s
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6667 CDB: Read(16) 88 00 00 00 00 03 8b 40 b7 e8 00 00 07 f0 00 00
May 16 12:48:25 sfs-truenas kernel: blk_update_request: I/O error, dev sdab, sector 15221176296 op 0x0:(READ) flags 0x700 phys_seg 110 prio class 0
May 16 12:48:25 sfs-truenas kernel: zio pool=SFS-ZFS vdev=/dev/disk/by-partuuid/03704d2e-7555-4d2f-8d51-db97b02a7827 error=5 type=1 offset=7791094714368 size=1040384 flags=40080c80
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: attempting task abort!scmd(0x00000000a28c8862), outstanding for 30264 ms & timeout 30000 ms
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: [sdab] tag#6521 CDB: Read(16) 88 00 00 00 00 00 fc 66 35 28 00 00 00 08 00 00
May 16 12:48:25 sfs-truenas kernel: scsi target0:0:25: handle(0x0028), sas_address(0x500304802168b0c3), phy(3)
May 16 12:48:25 sfs-truenas kernel: scsi target0:0:25: enclosure logical id(0x500304802168b0ff), slot(26)
May 16 12:48:25 sfs-truenas kernel: scsi target0:0:25: enclosure level(0x0001), connector name( C0.1)
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: No reference found at driver, assuming scmd(0x00000000a28c8862) might have completed
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: task abort: SUCCESS scmd(0x00000000a28c8862)
May 16 12:48:25 sfs-truenas kernel: zio pool=SFS-ZFS vdev=/dev/disk/by-partuuid/03704d2e-7555-4d2f-8d51-db97b02a7827 error=5 type=1 offset=2165945487360 size=4096 flags=180880
May 16 12:48:25 sfs-truenas kernel: sd 0:0:25:0: Power-on or device reset occurred
May 16 12:58:25 sfs-truenas kernel: sd 0:0:23:0: attempting task abort!scmd(0x000000005efd3b62), outstanding for 30220 ms & timeout 30000 ms
May 16 12:58:25 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6650 CDB: Write(16) 8a 00 00 00 00 03 d4 b0 61 40 00 00 00 d0 00 00
May 16 12:58:25 sfs-truenas kernel: scsi target0:0:23: handle(0x0026), sas_address(0x500304802168b0c1), phy(1)
May 16 12:58:25 sfs-truenas kernel: scsi target0:0:23: enclosure logical id(0x500304802168b0ff), slot(24)
May 16 12:58:25 sfs-truenas kernel: scsi target0:0:23: enclosure level(0x0001), connector name( C0.1)
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: task abort: SUCCESS scmd(0x000000005efd3b62)
May 16 12:58:26 sfs-truenas kernel: scsi_io_completion_action: 1 callbacks suppressed
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6650 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=30s
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6650 CDB: Write(16) 8a 00 00 00 00 03 d4 b0 61 40 00 00 00 d0 00 00
May 16 12:58:26 sfs-truenas kernel: print_req_error: 1 callbacks suppressed
May 16 12:58:26 sfs-truenas kernel: blk_update_request: I/O error, dev sdaa, sector 16453230912 op 0x1:(WRITE) flags 0x700 phys_seg 26 prio class 0
May 16 12:58:26 sfs-truenas kernel: zio pool=SFS-ZFS vdev=/dev/disk/by-partuuid/904f837f-0c13-453f-a1e7-81901c9ac05c error=5 type=2 offset=8421906677760 size=106496 flags=180880
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: attempting task abort!scmd(0x000000004548a8c7), outstanding for 30628 ms & timeout 30000 ms
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6631 CDB: Write(16) 8a 00 00 00 00 03 d4 b0 60 78 00 00 00 c8 00 00
May 16 12:58:26 sfs-truenas kernel: scsi target0:0:23: handle(0x0026), sas_address(0x500304802168b0c1), phy(1)
May 16 12:58:26 sfs-truenas kernel: scsi target0:0:23: enclosure logical id(0x500304802168b0ff), slot(24)
May 16 12:58:26 sfs-truenas kernel: scsi target0:0:23: enclosure level(0x0001), connector name( C0.1)
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: No reference found at driver, assuming scmd(0x000000004548a8c7) might have completed
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: task abort: SUCCESS scmd(0x000000004548a8c7)
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6631 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=30s
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6631 CDB: Write(16) 8a 00 00 00 00 03 d4 b0 60 78 00 00 00 c8 00 00
May 16 12:58:26 sfs-truenas kernel: blk_update_request: I/O error, dev sdaa, sector 16453230712 op 0x1:(WRITE) flags 0x700 phys_seg 13 prio class 0
May 16 12:58:26 sfs-truenas kernel: zio pool=SFS-ZFS vdev=/dev/disk/by-partuuid/904f837f-0c13-453f-a1e7-81901c9ac05c error=5 type=2 offset=8421906575360 size=102400 flags=180880
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: attempting task abort!scmd(0x000000004369f4dd), outstanding for 32424 ms & timeout 30000 ms
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6667 CDB: Read(16) 88 00 00 00 00 03 62 c6 67 d0 00 00 00 38 00 00
May 16 12:58:26 sfs-truenas kernel: scsi target0:0:23: handle(0x0026), sas_address(0x500304802168b0c1), phy(1)
May 16 12:58:26 sfs-truenas kernel: scsi target0:0:23: enclosure logical id(0x500304802168b0ff), slot(24)
May 16 12:58:26 sfs-truenas kernel: scsi target0:0:23: enclosure level(0x0001), connector name( C0.1)
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: No reference found at driver, assuming scmd(0x000000004369f4dd) might have completed
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: task abort: SUCCESS scmd(0x000000004369f4dd)
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6667 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=32s
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6667 CDB: Read(16) 88 00 00 00 00 03 62 c6 67 d0 00 00 00 38 00 00
May 16 12:58:26 sfs-truenas kernel: blk_update_request: I/O error, dev sdaa, sector 14542071760 op 0x0:(READ) flags 0x700 phys_seg 6 prio class 0
May 16 12:58:26 sfs-truenas kernel: zio pool=SFS-ZFS vdev=/dev/disk/by-partuuid/904f837f-0c13-453f-a1e7-81901c9ac05c error=5 type=1 offset=7443393191936 size=28672 flags=180980
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: attempting task abort!scmd(0x00000000956844ba), outstanding for 32440 ms & timeout 30000 ms
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6666 CDB: Read(16) 88 00 00 00 00 03 62 c6 63 20 00 00 04 b0 00 00
May 16 12:58:26 sfs-truenas kernel: scsi target0:0:23: handle(0x0026), sas_address(0x500304802168b0c1), phy(1)
May 16 12:58:26 sfs-truenas kernel: scsi target0:0:23: enclosure logical id(0x500304802168b0ff), slot(24)
May 16 12:58:26 sfs-truenas kernel: scsi target0:0:23: enclosure level(0x0001), connector name( C0.1)
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: No reference found at driver, assuming scmd(0x00000000956844ba) might have completed
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: task abort: SUCCESS scmd(0x00000000956844ba)
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6666 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=32s
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6666 CDB: Read(16) 88 00 00 00 00 03 62 c6 63 20 00 00 04 b0 00 00
May 16 12:58:26 sfs-truenas kernel: blk_update_request: I/O error, dev sdaa, sector 14542070560 op 0x0:(READ) flags 0x700 phys_seg 67 prio class 0
May 16 12:58:26 sfs-truenas kernel: zio pool=SFS-ZFS vdev=/dev/disk/by-partuuid/904f837f-0c13-453f-a1e7-81901c9ac05c error=5 type=1 offset=7443392577536 size=614400 flags=40080c80
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: attempting task abort!scmd(0x0000000077310cc6), outstanding for 30680 ms & timeout 30000 ms
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6709 CDB: Read(16) 88 00 00 00 00 01 00 7e ad b0 00 00 00 08 00 00
May 16 12:58:26 sfs-truenas kernel: scsi target0:0:23: handle(0x0026), sas_address(0x500304802168b0c1), phy(1)
May 16 12:58:26 sfs-truenas kernel: scsi target0:0:23: enclosure logical id(0x500304802168b0ff), slot(24)
May 16 12:58:26 sfs-truenas kernel: scsi target0:0:23: enclosure level(0x0001), connector name( C0.1)
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: No reference found at driver, assuming scmd(0x0000000077310cc6) might have completed
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: task abort: SUCCESS scmd(0x0000000077310cc6)
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6709 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=30s
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6709 CDB: Read(16) 88 00 00 00 00 01 00 7e ad b0 00 00 00 08 00 00
May 16 12:58:26 sfs-truenas kernel: blk_update_request: I/O error, dev sdaa, sector 4303269296 op 0x0:(READ) flags 0x700 phys_seg 1 prio class 0
May 16 12:58:26 sfs-truenas kernel: zio pool=SFS-ZFS vdev=/dev/disk/by-partuuid/904f837f-0c13-453f-a1e7-81901c9ac05c error=5 type=1 offset=2201126330368 size=4096 flags=180880
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: attempting task abort!scmd(0x00000000581bbac9), outstanding for 32004 ms & timeout 30000 ms
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6662 CDB: Read(16) 88 00 00 00 00 03 9c ba 44 d0 00 00 07 f0 00 00
May 16 12:58:26 sfs-truenas kernel: scsi target0:0:23: handle(0x0026), sas_address(0x500304802168b0c1), phy(1)
May 16 12:58:26 sfs-truenas kernel: scsi target0:0:23: enclosure logical id(0x500304802168b0ff), slot(24)
May 16 12:58:26 sfs-truenas kernel: scsi target0:0:23: enclosure level(0x0001), connector name( C0.1)
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: No reference found at driver, assuming scmd(0x00000000581bbac9) might have completed
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: task abort: SUCCESS scmd(0x00000000581bbac9)
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6662 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=32s
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6662 CDB: Read(16) 88 00 00 00 00 03 9c ba 44 d0 00 00 07 f0 00 00
May 16 12:58:26 sfs-truenas kernel: blk_update_request: I/O error, dev sdaa, sector 15514354896 op 0x0:(READ) flags 0x700 phys_seg 112 prio class 0
May 16 12:58:26 sfs-truenas kernel: zio pool=SFS-ZFS vdev=/dev/disk/by-partuuid/904f837f-0c13-453f-a1e7-81901c9ac05c error=5 type=1 offset=7941202157568 size=1040384 flags=40080c80
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: attempting task abort!scmd(0x000000007b61033f), outstanding for 31960 ms & timeout 30000 ms
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6678 CDB: Read(16) 88 00 00 00 00 03 85 46 9d f0 00 00 07 e8 00 00
May 16 12:58:26 sfs-truenas kernel: scsi target0:0:23: handle(0x0026), sas_address(0x500304802168b0c1), phy(1)
May 16 12:58:26 sfs-truenas kernel: scsi target0:0:23: enclosure logical id(0x500304802168b0ff), slot(24)
May 16 12:58:26 sfs-truenas kernel: scsi target0:0:23: enclosure level(0x0001), connector name( C0.1)
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: No reference found at driver, assuming scmd(0x000000007b61033f) might have completed
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: task abort: SUCCESS scmd(0x000000007b61033f)
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6678 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=31s
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6678 CDB: Read(16) 88 00 00 00 00 03 85 46 9d f0 00 00 07 e8 00 00
May 16 12:58:26 sfs-truenas kernel: blk_update_request: I/O error, dev sdaa, sector 15120899568 op 0x0:(READ) flags 0x700 phys_seg 119 prio class 0
May 16 12:58:26 sfs-truenas kernel: zio pool=SFS-ZFS vdev=/dev/disk/by-partuuid/904f837f-0c13-453f-a1e7-81901c9ac05c error=5 type=1 offset=7739753029632 size=1036288 flags=40080c80
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: attempting task abort!scmd(0x0000000051a9bc7d), outstanding for 32500 ms & timeout 30000 ms
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6635 CDB: Read(16) 88 00 00 00 00 03 62 31 31 c0 00 00 02 20 00 00
May 16 12:58:26 sfs-truenas kernel: scsi target0:0:23: handle(0x0026), sas_address(0x500304802168b0c1), phy(1)
May 16 12:58:26 sfs-truenas kernel: scsi target0:0:23: enclosure logical id(0x500304802168b0ff), slot(24)
May 16 12:58:26 sfs-truenas kernel: scsi target0:0:23: enclosure level(0x0001), connector name( C0.1)
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: No reference found at driver, assuming scmd(0x0000000051a9bc7d) might have completed
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: task abort: SUCCESS scmd(0x0000000051a9bc7d)
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6635 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=32s
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: [sdaa] tag#6635 CDB: Read(16) 88 00 00 00 00 03 62 31 31 c0 00 00 02 20 00 00
May 16 12:58:26 sfs-truenas kernel: blk_update_request: I/O error, dev sdaa, sector 14532293056 op 0x0:(READ) flags 0x700 phys_seg 23 prio class 0
May 16 12:58:26 sfs-truenas kernel: zio pool=SFS-ZFS vdev=/dev/disk/by-partuuid/904f837f-0c13-453f-a1e7-81901c9ac05c error=5 type=1 offset=7438386495488 size=278528 flags=40080c80
May 16 12:58:26 sfs-truenas kernel: sd 0:0:23:0: Power-on or device reset occurred
 

im.thatoneguy

Dabbler
Joined
Nov 4, 2022
Messages
37
I also pulled the LSI Logs using LSIUtil and every drive is listed as being connected to both cables. So backplane must do some sort of switching itself.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Both cables? Are you using multipath?

Also, please tell us the drives model.
 

im.thatoneguy

Dabbler
Joined
Nov 4, 2022
Messages
37
As I mentioned the HBA uses a slimsas 8x breakout to two 4x slimsas cables into the backplane.

Like this:

Drives are 16TB HC550
WDC WUH721816ALE6L4

According to IMPI all drives are about 36C. The Backplane is about 34C. So temperatures seem good there. Can't find a way to find HBA temps.

Also according to the IPMI there are stats all 7 of the drives are in "Enclosure:1/EID29" while all the other 21 drives somehow are assigned "Enclosure:0/EID21"

So it would appear that they ARE physically somehow segmented out from the rest!
 

im.thatoneguy

Dabbler
Joined
Nov 4, 2022
Messages
37
EDIT: *Except for one of the drives

RAIDZ2
  • dev/sdaa | Enclosure:1/EID:29
  • dev/sdab | Enclosure:1/EID:29
  • dev/sdac | Enclosure:1/EID:29
  • dev/sdad | Enclosure:1/EID:29
  • dev/sdq | Enclosure:0/EID:21
  • dev/sdy | Enclosure:1/EID:29
  • dev/sdz | Enclosure:1/EID:29
... 21 more drives "in Enclosure:0/EID:21"
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Try switching the two ends, see if you get errors in other places... Or directly replace the cable.
 

im.thatoneguy

Dabbler
Joined
Nov 4, 2022
Messages
37
Unfortunately it looks like accessing said plugs would be a massive PITA. But what I did do was:

1) Moved all of the drives to consecutive slots on the other side of the chassis.
2) Updated the Firmware to v25 on the HBA.
3) Updated the Driver to v45 (I'll need to remember to check again next time TrueNAS updates) since it'll probably revert.
4) Unplugged the accessible 8x cable end and reseated it just to be sure.

5) And added a hot spare just for good measure.

Now I guess I wait for hopefully a few months to see if anything happens.
 

im.thatoneguy

Dabbler
Joined
Nov 4, 2022
Messages
37

Also, this sounds very similar. Different HBA (9500 series vs 9300 series) but same overall symptom.
 
Last edited:

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Top