freenas-supero
Contributor
- Joined
- Jul 27, 2014
- Messages
- 128
Hello,
My Freenas server has just crashed in a way I've never seen before. First of all I got an email from the box saying that a drive was faulted and that the array was degraded.
Then I tried logging in via web UI and I get this error page:
Using SSH and dmesg shows:
zpool status zpool:
I do not have a spare hard drive at hand and need to order one. Should I shut it down until I have a hard drive handy? Array uses a Z3 RAID, so technically, I would have to lose 3 drives to be in real danger but who knows...
Best practices?
Thanks a bunch!!
EDIT: OK I managed to reboot the WebUI with
The webUI is back. Strange that it crashed at the same time the HDD faulted. I also narrowed down which one of the HDD failed and its indeed a Seagate drive. So far, all 6 Seagates drives I initially had have failed. Nice.
I ordered 2 WD RED drives but they wont likely be here before next wednesday or thursday (so 6 or 7 days from now). Would it be advisable to shutdown the server until then or let it run on degraded array? I do have backups but I wont need the server until at least Sunday afternoon so....
My Freenas server has just crashed in a way I've never seen before. First of all I got an email from the box saying that a drive was faulted and that the array was degraded.
The volume zpool (ZFS) state is DEGRADED: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state.
Then I tried logging in via web UI and I get this error page:
An error occurred.
Sorry, the page you are looking for is currently unavailable.
Please try again later.
If you are the system administrator of this resource then you should check the error log for details.
Faithfully yours, nginx.
Using SSH and dmesg shows:
Code:
(da3:mps0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 677 command timeout cm 0xfffffe00009f1890 ccb 0xfffff8010c140800 (noperiph:mps0:0:4294967295:0): SMID 1 Aborting command 0xfffffe00009f1890 mps0: Sending reset from mpssas_send_abort for target ID 4 (da3:mps0:0:4:0): WRITE(10). CDB: 2a 00 ba 72 af 48 00 00 10 00 length 8192 SMID 633 terminated ioc 804b scsi 0 state c xfer 0 mps0: (da3:mps0:0:4:0): WRITE(10). CDB: 2a 00 ba 72 af 48 00 00 10 00 Unfreezing devq for target ID 4 (da3:mps0:0:4:0): CAM status: CCB request completed with an error (da3:mps0:0:4:0): Retrying command (da3:mps0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da3:mps0:0:4:0): CAM status: Command timeout (da3:mps0:0:4:0): Retrying command (da3:mps0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da3:mps0:0:4:0): CAM status: SCSI Status Error (da3:mps0:0:4:0): SCSI status: Check Condition (da3:mps0:0:4:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) (da3:mps0:0:4:0): Error 6, Retries exhausted (da3:mps0:0:4:0): Invalidating pack GEOM_ELI: g_eli_read_done() failed (error=6) da3p1.eli[READ(offset=27906048, length=4096)] swap_pager: I/O error - pagein failed; blkno 3152547,size 4096, error 6 vm_fault: pager read error, pid 4099 (zfsd) GEOM_ELI: g_eli_read_done() failed (error=6) da3p1.eli[READ(offset=27906048, length=4096)] swap_pager: I/O error - pagein failed; blkno 3152547,size 4096, error 6 vm_fault: pager read error, pid 4099 (zfsd) Failed to write core file for process zfsd (error 14) pid 4099 (zfsd), uid 0: exited on signal 11 GEOM_ELI: g_eli_read_done() failed (error=6) da3p1.eli[READ(offset=5980160, length=4096)] swap_pager: I/O error - pagein failed; blkno 3147194,size 4096, error 6 vm_fault: pager read error, pid 1253 (devd) GEOM_ELI: g_eli_read_done() failed (error=6) da3p1.eli[READ(offset=5943296, length=28672)] swap_pager: I/O error - pagein failed; blkno 3147185,size 28672, error 6 vm_fault: pager read error, pid 1253 (devd) Failed to write core file for process devd (error 14) pid 1253 (devd), uid 0: exited on signal 11 GEOM_ELI: g_eli_read_done() failed (error=6) da3p1.eli[READ(offset=19120128, length=8192)] swap_pager: I/O error - pagein failed; blkno 3150402,size 8192, error 6 vm_fault: pager read error, pid 3064 (python2.7) pid 3064 (python2.7), uid 0: exited on signal 11 GEOM_ELI: g_eli_read_done() failed (error=6) da3p1.eli[READ(offset=589824, length=32768)] swap_pager: I/O error - pagein failed; blkno 3145878,size 32768, error 6 vm_fault: pager read error, pid 3087 (python2.7) GEOM_ELI: g_eli_read_done() failed (error=6) da3p1.eli[READ(offset=16900096, length=4096)] swap_pager: I/O error - pagein failed; blkno 3149860,size 4096, error 6 vm_fault: pager read error, pid 3087 (python2.7) Failed to write core file for process python2.7 (error 14) pid 3087 (python2.7), uid 0: exited on signal 11
zpool status zpool:
Code:
[root@freenas] ~# zpool status zpool pool: zpool state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. scan: scrub in progress since Thu Jun 28 03:00:05 2018 11.0T scanned out of 11.1T at 175M/s, 0h16m to go 0 repaired, 98.52% done config: NAME STATE READ WRITE CKSUM zpool DEGRADED 0 0 0 raidz3-0 DEGRADED 0 0 0 gptid/4a751424-5a4a-11e5-82f2-0030487f11ba ONLINE 0 0 0 gptid/7231ce76-0fb8-11e4-9267-0030487f11ba ONLINE 0 0 0 gptid/74010031-0fb8-11e4-9267-0030487f11ba ONLINE 0 0 0 gptid/3010e8b6-1d80-11e7-ac2f-0025907ad3a1 ONLINE 0 0 0 gptid/7577d07e-0fb8-11e4-9267-0030487f11ba ONLINE 0 0 0 gptid/7799b692-0fb8-11e4-9267-0030487f11ba ONLINE 0 0 0 gptid/7979c1c6-0fb8-11e4-9267-0030487f11ba FAULTED 6 616 0 too many errors gptid/7ba4673f-0fb8-11e4-9267-0030487f11ba ONLINE 0 0 0
I do not have a spare hard drive at hand and need to order one. Should I shut it down until I have a hard drive handy? Array uses a Z3 RAID, so technically, I would have to lose 3 drives to be in real danger but who knows...
Best practices?
Thanks a bunch!!
EDIT: OK I managed to reboot the WebUI with
Code:
service django stop service django start
The webUI is back. Strange that it crashed at the same time the HDD faulted. I also narrowed down which one of the HDD failed and its indeed a Seagate drive. So far, all 6 Seagates drives I initially had have failed. Nice.
I ordered 2 WD RED drives but they wont likely be here before next wednesday or thursday (so 6 or 7 days from now). Would it be advisable to shutdown the server until then or let it run on degraded array? I do have backups but I wont need the server until at least Sunday afternoon so....
Last edited: