zerothaught
Dabbler
- Joined
- Jan 1, 2024
- Messages
- 13
Hello Everyone,
I am quite new to TrueNAS, so excuse me if this is a dumb question, but I couldn't seem to find any existing threads that covered what I am experiencing. I am currently running TrueNAS Scale on a Storinator Q30.
OS Version:TrueNAS-SCALE-22.12.3.2
Product:Q30
Model:Intel(R) Xeon(R) Bronze 3204 CPU @ 1.90GHz
Memory:125 GiB
3 x RAIDZ2 | 10 wide | 16.37 TiB
Drive Model: ST18000NM000J-2TV103 x30
I woke up today to the following alerts from my TrueNAS system:
• Pool Storinator state is DEGRADED: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state.
The following devices are not healthy:
o Disk ST18000NM000J-2TV103 ZR5D35C0 is REMOVED
o Disk ST18000NM000J-2TV103 ZR5D4FD9 is REMOVED
About 1 minute after the alert I recieved another emails saying that the alert was cleared.
Then another minute later I got another alert for one of the same disks:
• Pool Storinator state is DEGRADED: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state.
The following devices are not healthy:
o Disk ST18000NM000J-2TV103 ZR5D35C0 is REMOVED
Then finally I got this email:
• Pool Storinator state is ONLINE: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state.
Upon looking at the dashboard, it shows that I have 0 disks with errors and that my pool is online.
I did a bit more digging and ran a dmesg command and saw the following errors:
Disk failure on sd01, disabling devices
md/raid1:md122: Operation continuuing on 1 devices
blk_update_request: I/O error, dev sdo, sector 19604053776 op 0x11:(WRITE) flags 0x700 phys_seg 15 prio class 0
blk_update_request: I/O error, dev sde, sector 1920752848 op 0x0: (READ) flags 0x0 phys_seg 1 prio class 0
I see multiple of these for different sectors all on SDE
Buffer I/O error on dev sde2, logical block 4097, async page read
I ran sudo smartctl -a /dev/sdo1 and I don't see any Reallocated Sector Count or any CRC Error Counts. I am also not seeing any errors in the SMART testing for that drive.
I ran a scrub on the pool and after it completes I get an alert saying that resilvering is in process and that 18MB were copied over.
To my untrained eye, it seems as if a disk is failing or there is a hardware error, but I would assume I would see SMART errors or something on the disk. Are there any other commands I should run, or should I look at contacting the manufacturer?
I am quite new to TrueNAS, so excuse me if this is a dumb question, but I couldn't seem to find any existing threads that covered what I am experiencing. I am currently running TrueNAS Scale on a Storinator Q30.
OS Version:TrueNAS-SCALE-22.12.3.2
Product:Q30
Model:Intel(R) Xeon(R) Bronze 3204 CPU @ 1.90GHz
Memory:125 GiB
3 x RAIDZ2 | 10 wide | 16.37 TiB
Drive Model: ST18000NM000J-2TV103 x30
I woke up today to the following alerts from my TrueNAS system:
• Pool Storinator state is DEGRADED: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state.
The following devices are not healthy:
o Disk ST18000NM000J-2TV103 ZR5D35C0 is REMOVED
o Disk ST18000NM000J-2TV103 ZR5D4FD9 is REMOVED
About 1 minute after the alert I recieved another emails saying that the alert was cleared.
Then another minute later I got another alert for one of the same disks:
• Pool Storinator state is DEGRADED: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state.
The following devices are not healthy:
o Disk ST18000NM000J-2TV103 ZR5D35C0 is REMOVED
Then finally I got this email:
• Pool Storinator state is ONLINE: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state.
Upon looking at the dashboard, it shows that I have 0 disks with errors and that my pool is online.
I did a bit more digging and ran a dmesg command and saw the following errors:
Disk failure on sd01, disabling devices
md/raid1:md122: Operation continuuing on 1 devices
blk_update_request: I/O error, dev sdo, sector 19604053776 op 0x11:(WRITE) flags 0x700 phys_seg 15 prio class 0
blk_update_request: I/O error, dev sde, sector 1920752848 op 0x0: (READ) flags 0x0 phys_seg 1 prio class 0
I see multiple of these for different sectors all on SDE
Buffer I/O error on dev sde2, logical block 4097, async page read
I ran sudo smartctl -a /dev/sdo1 and I don't see any Reallocated Sector Count or any CRC Error Counts. I am also not seeing any errors in the SMART testing for that drive.
I ran a scrub on the pool and after it completes I get an alert saying that resilvering is in process and that 18MB were copied over.
To my untrained eye, it seems as if a disk is failing or there is a hardware error, but I would assume I would see SMART errors or something on the disk. Are there any other commands I should run, or should I look at contacting the manufacturer?