The Failing Disk, didn't seem to surprising
Then what starts to make me concerned it appears ZFSD core dumped? did ZFS crash?
This is what is rest of the noise that continued and web gui stopped responding and started reporting the http sessions were abort which they were since they didn't open a page after 60 seconds I closed the tab. HTTP 499
Tried to restart nginx and that failed so I started to assume boot drive is having issues but its mirrored with no pool errors, couldn't get a way to run smart on those usb sticks with the device param The SATA2TB doesn't show the failed disk now after I ran scrub on that pool which was weird.
Here is the failing drive but scrub shouldn't add this drive back in?
I've got a replacement drive coming in today but i'm more interested in what the next steps should be as the web gui isn't work I would rather not reboot with a failing disk if I don't have to.
Code:
ahcich7: Timeout on slot 11 port 0 ahcich7: is 00000000 cs 00000800 ss 00000000 rs 00000800 tfd c0 serr 00000000 cmd 0004cb17 (ada6:ahcich7:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 (ada6:ahcich7:0:0:0): CAM status: Command timeout (ada6:ahcich7:0:0:0): Retrying command ahcich7: AHCI reset: device not ready after 31000ms (tfd = 00000080) ahcich7: Timeout on slot 12 port 0 ahcich7: is 00000000 cs 00001000 ss 00000000 rs 00001000 tfd 80 serr 00000000 cmd 0004cc17 (aprobe0:ahcich7:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 (aprobe0:ahcich7:0:0:0): CAM status: Command timeout (aprobe0:ahcich7:0:0:0): Retrying command ahcich7: Timeout on slot 25 port 0 ahcich7: is 00000000 cs 02000000 ss 00000000 rs 02000000 tfd d0 serr 00000000 cmd 0004d917 ahcich7: AHCI reset: device not ready after 31000ms (tfd = 00000080) ahcich7: Timeout on slot 26 port 0 ahcich7: is 00000000 cs 04000000 ss 00000000 rs 04000000 tfd 80 serr 00000000 cmd 0004da17 (aprobe0:ahcich7:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 (aprobe0:ahcich7:0:0:0): CAM status: Command timeout (aprobe0:ahcich7:0:0:0): Retrying command ahcich7: AHCI reset: device not ready after 31000ms (tfd = 00000080) ahcich7: Timeout on slot 27 port 0 ahcich7: is 00000000 cs 08000000 ss 00000000 rs 08000000 tfd 80 serr 00000000 cmd 0004db17 (aprobe0:ahcich7:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 (aprobe0:ahcich7:0:0:0): CAM status: Command timeout (aprobe0:ahcich7:0:0:0): Error 5, Retries exhausted swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2100740, size: 49152 ahcich7: AHCI reset: device not ready after 31000ms (tfd = 00000080) swap_pager: indefinite wait buffer: bufobj: 0, blkno: 2100740, size: 49152 ahcich7: Timeout on slot 28 port 0 ahcich7: is 00000000 cs 10000000 ss 00000000 rs 10000000 tfd 80 serr 00000000 cmd 0004dc17 (aprobe0:ahcich7:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 (aprobe0:ahcich7:0:0:0): CAM status: Command timeout (aprobe0:ahcich7:0:0:0): Error 5, Retry was blocked ada6 at ahcich7 bus 0 scbus7 target 0 lun 0 ada6: <ST32000542AS CC34> s/n 5XW28KWS detached
Then what starts to make me concerned it appears ZFSD core dumped? did ZFS crash?
Code:
GEOM_ELI: g_eli_read_done() failed (error=6) ada6p1.eli[READ(offset=14680064, length=49152)] swap_pager: I/O error - pagein failed; blkno 2100740,size 49152, error 6 vm_fault: pager read error, pid 2940 (python3.6) pid 2940 (python3.6), uid 0: exited on signal 11 GEOM_ELI: Device ada6p1.eli destroyed. GEOM_ELI: Detached ada6p1.eli on last close. swap_pager: I/O error - pagein failed; blkno 2101950,size 8192, error 6 vm_fault: pager read error, pid 3769 (zfsd) swap_pager: I/O error - pagein failed; blkno 2098787,size 4096, error 6 vm_fault: pager read error, pid 3769 (zfsd) Failed to fully fault in a core file segment at VA 0x800679000 with size 0x2f000 to be written at offset 0x10000 for process zfsd swap_pager: I/O error - pagein failed; blkno 2098818,size 4096, error 6 vm_fault: pager read error, pid 3769 (zfsd) Failed to fully fault in a core file segment at VA 0x801dfa000 with size 0x7000 to be written at offset 0x4f000 for process zfsd swap_pager: I/O error - pagein failed; blkno 2098819,size 4096, error 6 vm_fault: pager read error, pid 3769 (zfsd) Failed to fully fault in a core file segment at VA 0x80303a000 with size 0x1000 to be written at offset 0x84000 for process zfsd swap_pager: I/O error - pagein failed; blkno 2098820,size 4096, error 6 vm_fault: pager read error, pid 3769 (zfsd) Failed to fully fault in a core file segment at VA 0x803400000 with size 0x400000 to be written at offset 0x92000 for process zfsd swap_pager: I/O error - pagein failed; blkno 2111670,size 8192, error 6 vm_fault: pager read error, pid 3769 (zfsd) Failed to fully fault in a core file segment at VA 0x7ffffffdf000 with size 0x20000 to be written at offset 0x492000 for process zfsd pid 3769 (zfsd), uid 0: exited on signal 11 (core dumped) ahcich7: AHCI reset: device not ready after 31000ms (tfd = 00000080) ahcich7: Timeout on slot 29 port 0 ahcich7: is 00000000 cs 20000000 ss 00000000 rs 20000000 tfd 80 serr 00000000 cmd 0004dd17 (aprobe0:ahcich7:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 (aprobe0:ahcich7:0:0:0): CAM status: Command timeout (aprobe0:ahcich7:0:0:0): Retrying command ahcich7: AHCI reset: device not ready after 31000ms (tfd = 00000080) ahcich7: Timeout on slot 30 port 0 ahcich7: is 00000000 cs 40000000 ss 00000000 rs 40000000 tfd 80 serr 00000000 cmd 0004de17 (aprobe0:ahcich7:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 (aprobe0:ahcich7:0:0:0): CAM status: Command timeout (aprobe0:ahcich7:0:0:0): Error 5, Retries exhausted (ada6:ahcich7:0:0:0): Periph destroyed ada6 at ahcich7 bus 0 scbus7 target 0 lun 0 ada6: <ST32000542AS CC34> ATA8-ACS SATA 2.x device ada6: Serial Number 5XW28KWS ada6: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada6: Command Queueing enabled ada6: 1907729MB (3907029168 512 byte sectors)
This is what is rest of the noise that continued and web gui stopped responding and started reporting the http sessions were abort which they were since they didn't open a page after 60 seconds I closed the tab. HTTP 499
Code:
swap_pager: I/O error - pagein failed; blkno 2098667,size 8192, error 6 vm_fault: pager read error, pid 215 (python3.6) swap_pager: I/O error - pagein failed; blkno 2098916,size 65536, error 6 vm_fault: pager read error, pid 215 (python3.6) swap_pager: I/O error - pagein failed; blkno 2098931,size 4096, error 6 vm_fault: pager read error, pid 215 (python3.6) swap_pager: I/O error - pagein failed; blkno 2098908,size 4096, error 6 vm_fault: pager read error, pid 215 (python3.6) Failed to fully fault in a core file segment at VA 0x800622000 with size 0x2c000 to be written at offset 0x25000 for process python3.6 swap_pager: I/O error - pagein failed; blkno 2104739,size 4096, error 6 vm_fault: pager read error, pid 215 (python3.6) Failed to fully fault in a core file segment at VA 0x800665000 with size 0x18e000 to be written at offset 0x68000 for process python3.6 swap_pager: I/O error - pagein failed; blkno 2104961,size 4096, error 6 vm_fault: pager read error, pid 215 (python3.6) Failed to fully fault in a core file segment at VA 0x8007f3000 with size 0x9000 to be written at offset 0x1f6000 for process python3.6 swap_pager: I/O error - pagein failed; blkno 2099745,size 4096, error 6 vm_fault: pager read error, pid 215 (python3.6)
Tried to restart nginx and that failed so I started to assume boot drive is having issues but its mirrored with no pool errors, couldn't get a way to run smart on those usb sticks with the device param The SATA2TB doesn't show the failed disk now after I ran scrub on that pool which was weird.
Code:
pool: SATA2TB state: ONLINE scan: scrub repaired 0 in 0h0m with 0 errors on Tue Oct 17 12:40:43 2017 config: NAME STATE READ WRITE CKSUM SATA2TB ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 gptid/6a9fd7d6-a913-11e7-80cd-bc5ff4a85b10 ONLINE 0 0 0 gptid/6bc35ce7-a913-11e7-80cd-bc5ff4a85b10 ONLINE 0 0 0 gptid/6ccf6f39-a913-11e7-80cd-bc5ff4a85b10 ONLINE 0 0 0 gptid/6dc0a9a4-a913-11e7-80cd-bc5ff4a85b10 ONLINE 0 0 0 errors: No known data errors pool: freenas-boot state: ONLINE scan: scrub repaired 0 in 0h3m with 0 errors on Tue Oct 17 12:05:21 2017 config: NAME STATE READ WRITE CKSUM freenas-boot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 da0p2 ONLINE 0 0 0 da1p2 ONLINE 0 0 0 errors: No known data errors pool: small state: ONLINE scan: scrub repaired 0 in 0h4m with 0 errors on Tue Oct 17 12:07:05 2017 config: NAME STATE READ WRITE CKSUM small ONLINE 0 0 0 gptid/25640af8-a7e0-11e7-86f7-bc5ff4a85b10 ONLINE 0 0 0 errors: No known data errors
Here is the failing drive but scrub shouldn't add this drive back in?
Code:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 118 100 006 Pre-fail Always - 193403480 3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 22 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 051 051 030 Pre-fail Always - 141738809141 9 Power_On_Hours 0x0032 069 069 000 Old_age Always - 27667 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 22 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 098 098 000 Old_age Always - 8590065667 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 071 058 045 Old_age Always - 29 (Min/Max 29/35) 194 Temperature_Celsius 0x0022 029 042 000 Old_age Always - 29 (0 13 0 0 0) 195 Hardware_ECC_Recovered 0x001a 026 024 000 Old_age Always - 193403480 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 1 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 1 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 18856 (176 241 0) 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 1548236095 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 1946007858
I've got a replacement drive coming in today but i'm more interested in what the next steps should be as the web gui isn't work I would rather not reboot with a failing disk if I don't have to.