Sprint
Explorer
- Joined
- Mar 30, 2019
- Messages
- 72
Hi all
This is an odd one, I found a few threads which had similar issues but they didn't quiet line up with my usecase as they either had errors against drives or the pool was marked "offlien unhealthy" (Mine is still showing "Online unhealthy".
So one of my TrueNas boxes (this one is my primary backup, I have a secondary backup offsite too so data's safe) keeps stopping. I get an error, normally about "IO failure" or "IO suspended", and the pool goes offline, but yet all looks healthy and there are no errors against any drives?
Pool consists of..
vDev1 (Raidz1)
4x8Tb WD Reds
vDev2 (Raidz1)
4x8Tb WD Reds
vDev3 (Raidz1)
4x4Tb WD Reds
Server is virtualised, sitting on a Proxmox host, with 2 HBAs passed through, 64Gb of ram, boot medium is a pair of mirrored SSD within Proxmox.
Has 10 Threads assigned of a 10 Core Xeon. All other VMs are running without issue.
root@Plutonium[~]# zpool status -v
pool: Backup_Array
state: ONLINE
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-JQ
scan: scrub repaired 0B in 22:23:24 with 0 errors on Wed Jun 1 22:23:28 2022
config:
NAME STATE READ WRITE CKS UM
Backup_Array ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/ba485b25-63af-11eb-b69a-a0369f17c294 ONLINE 0 0 0
gptid/ba7f43b0-63af-11eb-b69a-a0369f17c294 ONLINE 0 0 0
gptid/ba84c80f-63af-11eb-b69a-a0369f17c294 ONLINE 0 0 0
gptid/bb2a0f9d-63af-11eb-b69a-a0369f17c294 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
gptid/0176dd85-9676-11ec-ba47-739745ebe144 ONLINE 0 0 0
gptid/bb00a0dc-63af-11eb-b69a-a0369f17c294 ONLINE 0 0 0
gptid/bb17532b-63af-11eb-b69a-a0369f17c294 ONLINE 0 0 0
gptid/bb272b43-63af-11eb-b69a-a0369f17c294 ONLINE 0 0 0
raidz1-2 ONLINE 0 0 0
gptid/f808d018-63fb-11eb-a09d-a0369f17c294 ONLINE 0 0 0
gptid/fad5a68a-63fb-11eb-a09d-a0369f17c294 ONLINE 0 0 0
gptid/faff21cb-63fb-11eb-a09d-a0369f17c294 ONLINE 0 0 0
gptid/faf320e9-63fb-11eb-a09d-a0369f17c294 ONLINE 0 0 0
errors: List of errors unavailable: pool I/O is currently suspended
pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:01:27 with 0 errors on Sat Jun 18 03:46:27 2022
config:
NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
ada0p2 ONLINE 0 0 0
errors: No known data errors
The error I get in the GUI is "
Looking for some guidance as to what my next steps should be? I'm confident a reboot will bring it back online like it has in the past, but i want to get to the bottom of this as this is at least the 3rd time this machines thrown its toys out of the pram.
Appreciate any and all feed back :)
This is an odd one, I found a few threads which had similar issues but they didn't quiet line up with my usecase as they either had errors against drives or the pool was marked "offlien unhealthy" (Mine is still showing "Online unhealthy".
So one of my TrueNas boxes (this one is my primary backup, I have a secondary backup offsite too so data's safe) keeps stopping. I get an error, normally about "IO failure" or "IO suspended", and the pool goes offline, but yet all looks healthy and there are no errors against any drives?
Pool consists of..
vDev1 (Raidz1)
4x8Tb WD Reds
vDev2 (Raidz1)
4x8Tb WD Reds
vDev3 (Raidz1)
4x4Tb WD Reds
Server is virtualised, sitting on a Proxmox host, with 2 HBAs passed through, 64Gb of ram, boot medium is a pair of mirrored SSD within Proxmox.
Has 10 Threads assigned of a 10 Core Xeon. All other VMs are running without issue.
root@Plutonium[~]# zpool status -v
pool: Backup_Array
state: ONLINE
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-JQ
scan: scrub repaired 0B in 22:23:24 with 0 errors on Wed Jun 1 22:23:28 2022
config:
NAME STATE READ WRITE CKS UM
Backup_Array ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/ba485b25-63af-11eb-b69a-a0369f17c294 ONLINE 0 0 0
gptid/ba7f43b0-63af-11eb-b69a-a0369f17c294 ONLINE 0 0 0
gptid/ba84c80f-63af-11eb-b69a-a0369f17c294 ONLINE 0 0 0
gptid/bb2a0f9d-63af-11eb-b69a-a0369f17c294 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
gptid/0176dd85-9676-11ec-ba47-739745ebe144 ONLINE 0 0 0
gptid/bb00a0dc-63af-11eb-b69a-a0369f17c294 ONLINE 0 0 0
gptid/bb17532b-63af-11eb-b69a-a0369f17c294 ONLINE 0 0 0
gptid/bb272b43-63af-11eb-b69a-a0369f17c294 ONLINE 0 0 0
raidz1-2 ONLINE 0 0 0
gptid/f808d018-63fb-11eb-a09d-a0369f17c294 ONLINE 0 0 0
gptid/fad5a68a-63fb-11eb-a09d-a0369f17c294 ONLINE 0 0 0
gptid/faff21cb-63fb-11eb-a09d-a0369f17c294 ONLINE 0 0 0
gptid/faf320e9-63fb-11eb-a09d-a0369f17c294 ONLINE 0 0 0
errors: List of errors unavailable: pool I/O is currently suspended
pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:01:27 with 0 errors on Sat Jun 18 03:46:27 2022
config:
NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
ada0p2 ONLINE 0 0 0
errors: No known data errors
The error I get in the GUI is "
CRITICAL
Pool Backup_Array state is ONLINE: One or more devices are faulted in response to IO failures.
Looking for some guidance as to what my next steps should be? I'm confident a reboot will bring it back online like it has in the past, but i want to get to the bottom of this as this is at least the 3rd time this machines thrown its toys out of the pram.
Appreciate any and all feed back :)