Random servercrashes (Ironwolf pro 10tb / ST10000VN0004)

Status
Not open for further replies.

krazos

Dabbler
Joined
Nov 11, 2017
Messages
15
Hi,

So I recently bought a SuperMicro 6047R-E1R36L server with X9DRD-7LN4F-JBOD motherboard. I have 5x 10TB disks in it. I'm running FreeNAS-11.0-U4.

My server is crashing randomly, sometimes after 2 days, and sometimes 4 days. It seems to be some timeout error, https://i.imgur.com/2wkzuIS.png . It can't read or write and then the server just hangs itself, so I need to hard-reset it each time. And it can't be the disks which I thought at first, because after a reboot everything is fine again, and there's no SMART errors and my disks (5 of them) are all new, and it's not the same disks that spits out errors or makes ZFS degrade the pool, Ive run tests on the disks.

My thoughts are that it's the backplane or cables that are the problem, I first thought it was something with the raid controller (SAS2308) but I have tried updating it to a newer firmware and bios, because those were outdated. (I'm running 20.00.07.00 now)

I just need some help because this is so frustrating and I need to get this fixed in a hurry. What do you think could be the problem? Any tips?
I would be very happy if I didn't have to change the backplane, because I can't afford to buy a new one atm. Cables, maybe.

Thanks in advance.
 

krazos

Dabbler
Joined
Nov 11, 2017
Messages
15
I do have a spare LSI 9211-8i laying around, should I try to change the cables to this instead? Or would that require me to rebuild my pool?
 

krazos

Dabbler
Joined
Nov 11, 2017
Messages
15

Good, thats what I thought but I wanted to be sure. That's because the pool is based on GPT ID/GUID instead of like some port number right?

I have changed to the other raid card now and flashed it with IT firmware, it worked perfectly. However, I still don't know if this will resolve my issue but I sure hope so, we'll se. I'll post again if there's a crash.
 

krazos

Dabbler
Joined
Nov 11, 2017
Messages
15
And now it crashed. Same error as before..

Even though I changed to this other Raid card "LSI 9211-8i" with the latest firmware and all.

What should I do? :/
 

krazos

Dabbler
Joined
Nov 11, 2017
Messages
15
How can I diagnose this further? Is there any way to see more info from the mps driver what drive/slot or thing is failing, and where can I even find these logs? They only show up on the console but nowhere in any log as I can see.

If there's some diagnostic tools in the mps driver..

Okay I found some errors, if anyone could help me make some sense of what goes wrong (it Crashed later this day at 17:52 but no logs were written, the last line was the last log before bootup):

Dec 3 09:05:17 freenas (da0:mps2:0:10:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 474 command timeout cm 0xfffffe0002183e20 ccb 0xfffff80a375a7000
Dec 3 09:05:17 freenas (noperiph:mps2:0:4294967295:0): SMID 1 Aborting command 0xfffffe0002183e20
Dec 3 09:05:17 freenas mps2: Sending reset from mpssas_send_abort for target ID 10
Dec 3 09:05:17 freenas (da0:mps2:0:10:0): READ(16). CDB: 88 00 00 00 00 01 f6 52 11 90 00 00 00 10 00 00 length 8192 SMID 851 terminated ioc 804b scsi 0 state c xfer 0
Dec 3 09:05:17 freenas (da0:mps2:0:10:0): WRITE(16). CDB: 8a 00 00 00 00 02 99 a7 5e 10 00 00 00 08 00 00 length 4096 SMID 766 terminated ioc 804b s(da0:mps2:0:10:0): READ(16). CDB: 88 00 00 00 00 01 f6 52 11 90 00 00 00 10 00 00
Dec 3 09:05:17 freenas csi 0 state c xfer 0
Dec 3 09:05:17 freenas (da0:mps2:0:10:0): CAM status: CCB request completed with an error
Dec 3 09:05:17 freenas (da0:mps2:0: (da0:mps2:0:10:0): WRITE(16). CDB: 8a 00 00 00 00 02 99 a7 67 78 00 00 00 18 00 00 length 12288 SMID 456 terminated ioc 804b 10:scsi 0 state c xfer 0
Dec 3 09:05:17 freenas 0): Retrying command
Dec 3 09:05:17 freenas (da0:mps2:0:10:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 89 terminated ioc 804b scsi 0 state c xfer 0
Dec 3 09:05:17 freenas (da0:mps2:0:10:0): WRITE(16). CDB: 8a 00 00 00 00 02 99 a7 5e 10 00 00 00 08 00 00
Dec 3 09:05:17 freenas (da0:mps2:0:10:0): READ(16). CDB: 88 00 00 00 00 01 cd 7f 3e a0 00 00 00 40 00 00 length 32768 SMID 917 terminated ioc 804b s(da0:mps2:0:10:0): CAM status: CCB request completed with an error
Dec 3 09:05:17 freenas (da0:mps2:0:10:0): Retrying command
Dec 3 09:05:17 freenas (da0:mps2:0:10:0): WRITE(16). CDB: 8a 00 00 00 00 02 99 a7 67 78 00 00 00 18 00 00
Dec 3 09:05:17 freenas (da0:mps2:0:10:0): CAM status: CCB request completed with an error
Dec 3 09:05:17 freenas (da0:mps2:0:10:0): Retrying command
Dec 3 09:05:17 freenas (da0:mps2:0:10:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
Dec 3 09:05:17 freenas (da0:mps2:0:10:0): CAM status: CCB request completed with an error
Dec 3 09:05:17 freenas (da0:mps2:0:10:0): Retrying command
Dec 3 09:05:17 freenas csi 0 state c xfer 0
Dec 3 09:05:17 freenas (da0:mps2:0:10:0): WRITE(16). CDB: 8a 00 00 00 00 02 99 a7 67 a0 00 00 00 08 00 00 length 4096 SMID 420 terminated ioc 804b s(da0:mps2:0:10:0): READ(16). CDB: 88 00 00 00 00 01 cd 7f 3e a0 00 00 00 40 00 00
Dec 3 09:05:17 freenas csi 0 state c xfer 0
Dec 3 09:05:17 freenas mps2: Unfreezing devq for target ID 10
Dec 3 09:05:17 freenas (da0:mps2:0:10:0): CAM status: CCB request completed with an error
Dec 3 09:05:17 freenas (da0:mps2:0:10:0): Retrying command
Dec 3 09:05:17 freenas (da0:mps2:0:10:0): WRITE(16). CDB: 8a 00 00 00 00 02 99 a7 67 a0 00 00 00 08 00 00
Dec 3 09:05:17 freenas (da0:mps2:0:10:0): CAM status: CCB request completed with an error
Dec 3 09:05:17 freenas (da0:mps2:0:10:0): Retrying command
Dec 3 09:05:17 freenas (da0:mps2:0:10:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
Dec 3 09:05:17 freenas (da0:mps2:0:10:0): CAM status: Command timeout
Dec 3 09:05:17 freenas (da0:mps2:0:10:0): Retrying command
Dec 3 09:05:18 freenas (da0:mps2:0:10:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
Dec 3 09:05:18 freenas (da0:mps2:0:10:0): CAM status: SCSI Status Error
Dec 3 09:05:18 freenas (da0:mps2:0:10:0): SCSI status: Check Condition
Dec 3 09:05:18 freenas (da0:mps2:0:10:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Dec 3 09:05:18 freenas (da0:mps2:0:10:0): Error 6, Retries exhausted
Dec 3 09:05:18 freenas (da0:mps2:0:10:0): Invalidating pack
Dec 3 09:05:18 freenas GEOM_ELI: g_eli_read_done() failed (error=6) da0p1.eli[READ(offset=56487936, length=4096)]
Dec 3 09:05:18 freenas swap_pager: I/O error - pagein failed; blkno 538080,size 4096, error 6
Dec 3 09:05:18 freenas GEOM_ELI: vm_fault: pager read error, pid 1823 (devd)
Dec 3 09:05:18 freenas g_eli_read_done() failed (error=6) da0p1.eli[READ(offset=91082752, length=4096)]
Dec 3 09:05:18 freenas swap_pager: I/O error - pagein failed; blkno 546526,size 4096, error 6
Dec 3 09:05:18 freenas vm_fault: pager read error, pid 4168 (zfsd)
Dec 3 09:05:18 freenas GEOM_ELI: g_eli_write_done() failed (error=6) gptid/bda22312-c626-11e7-bce5-0cc47a12fd2c.eli[WRITE(offset=5715776118784, length=8192)]
Dec 3 09:05:18 freenas GEOM_ELI: g_eli_read_done() failed (error=6) gptid/bda22312-c626-11e7-bce5-0cc47a12fd2c.eli[READ(offset=270336, length=8192)]
Dec 3 09:05:18 freenas GEOM_ELI: g_eli_read_done() failed (error=6) gptid/bda22312-c626-11e7-bce5-0cc47a12fd2c.eli[READ(offset=9998683086848, length=8192)]
Dec 3 09:05:18 freenas GEOM_ELI: g_eli_read_done() failed (error=6) gptid/bda22312-c626-11e7-bce5-0cc47a12fd2c.eli[READ(offset=9998683348992, length=8192)]
Dec 3 09:05:18 freenas GEOM_ELI: g_eli_read_done() failed (error=6) da0p1.eli[READ(offset=44548096, length=16384)]
Dec 3 09:05:18 freenas swap_pager: I/O error - pagein failed; blkno 535165,size 16384, error 6
Dec 3 09:05:18 freenas vm_fault: pager read error, pid 1823 (devd)
Dec 3 09:05:18 freenas kernel: Failed to fully fault in a core file segment at VA 0x720000 with size 0x14000 to be written at offset 0x6000 for process devd
Dec 3 09:05:18 freenas kernel: Failed to fully fault in a core file segment at VA 0x720000 with size 0x14000 to be written at offset 0x6000 for process devd
Dec 3 09:05:18 freenas GEOM_ELI: g_eli_read_done() failed (error=6) da0p1.eli[READ(offset=58597376, length=32768)]
Dec 3 09:05:18 freenas swap_pager: I/O error - pagein failed; blkno 538595,size 32768, error 6
Dec 3 09:05:18 freenas vm_fault: pager read error, pid 4168 (zfsd)
Dec 3 09:05:18 freenas kernel: Failed to fully fault in a core file segment at VA 0x800679000 with size 0x2f000 to be written at offset 0xf000 for process zfsd
Dec 3 09:05:18 freenas kernel: Failed to fully fault in a core file segment at VA 0x800679000 with size 0x2f000 to be written at offset 0xf000 for process zfsd
Dec 3 09:05:18 freenas GEOM_ELI: g_eli_read_done() failed (error=6) da0p1.eli[READ(offset=91029504, length=8192)]
Dec 3 09:05:18 freenas swap_pager: I/O error - pagein failed; blkno 546513,size 8192, error 6
Dec 3 09:05:18 freenas vm_fault: pager read error, pid 4168 (zfsd)
Dec 3 09:05:18 freenas kernel: Failed to fully fault in a core file segment at VA 0x80086f000 with size 0x2000 to be written at offset 0x3e000 for process zfsd
Dec 3 09:05:18 freenas kernel: Failed to fully fault in a core file segment at VA 0x80086f000 with size 0x2000 to be written at offset 0x3e000 for process zfsd
Dec 3 09:05:18 freenas GEOM_ELI: g_eli_read_done() failed (error=6) da0p1.eli[READ(offset=90959872, length=4096)]
Dec 3 09:05:18 freenas swap_pager: I/O error - pagein failed; blkno 546496,size 4096, error 6
Dec 3 09:05:18 freenas vm_fault: pager read error, pid 4168 (zfsd)
Dec 3 09:05:18 freenas kernel: Failed to fully fault in a core file segment at VA 0x8010e4000 with size 0x1000 to be written at offset 0x47000 for process zfsd
Dec 3 09:05:18 freenas kernel: Failed to fully fault in a core file segment at VA 0x8010e4000 with size 0x1000 to be written at offset 0x47000 for process zfsd
Dec 3 09:05:18 freenas GEOM_ELI: g_eli_read_done() failed (error=6) da0p1.eli[READ(offset=42987520, length=4096)]
Dec 3 09:05:18 freenas swap_pager: I/O error - pagein failed; blkno 534784,size 4096, error 6
Dec 3 09:05:18 freenas vm_fault: pager read error, pid 1823 (devd)
Dec 3 09:05:18 freenas kernel: Failed to fully fault in a core file segment at VA 0x800800000 with size 0x800000 to be written at offset 0x1a000 for process devd
Dec 3 09:05:18 freenas kernel: Failed to fully fault in a core file segment at VA 0x800800000 with size 0x800000 to be written at offset 0x1a000 for process devd
Dec 3 09:05:18 freenas kernel: pid 1823 (devd), uid 0: exited on signal 11 (core dumped)
Dec 3 09:05:18 freenas GEOM_ELI: g_eli_read_done() failed (error=6) da0p1.eli[READ(offset=91037696, length=4096)]
Dec 3 09:05:18 freenas swap_pager: I/O error - pagein failed; blkno 546515,size 4096, error 6
Dec 3 09:05:18 freenas vm_fault: pager read error, pid 4168 (zfsd)
Dec 3 09:05:18 freenas kernel: Failed to fully fault in a core file segment at VA 0x80171a000 with size 0x1000 to be written at offset 0x4b000 for process zfsd
Dec 3 09:05:18 freenas kernel: Failed to fully fault in a core file segment at VA 0x80171a000 with size 0x1000 to be written at offset 0x4b000 for process zfsd
Dec 3 09:05:19 freenas GEOM_ELI: g_eli_read_done() failed (error=6) da0p1.eli[READ(offset=58589184, length=8192)]
Dec 3 09:05:19 freenas swap_pager: I/O error - pagein failed; blkno 538593,size 8192, error 6
Dec 3 09:05:19 freenas vm_fault: pager read error, pid 4168 (zfsd)
Dec 3 09:05:19 freenas kernel: Failed to fully fault in a core file segment at VA 0x80201b000 with size 0x2000 to be written at offset 0x57000 for process zfsd
Dec 3 09:05:19 freenas kernel: Failed to fully fault in a core file segment at VA 0x80201b000 with size 0x2000 to be written at offset 0x57000 for process zfsd
Dec 3 09:05:19 freenas GEOM_ELI: g_eli_read_done() failed (error=6) da0p1.eli[READ(offset=52162560, length=4096)]
Dec 3 09:05:19 freenas swap_pager: I/O error - pagein failed; blkno 537024,size 4096, error 6
Dec 3 09:05:19 freenas vm_fault: pager read error, pid 4168 (zfsd)
Dec 3 09:05:19 freenas kernel: Failed to fully fault in a core file segment at VA 0x802e21000 with size 0x1000 to be written at offset 0x82000 for process zfsd
Dec 3 09:05:19 freenas kernel: Failed to fully fault in a core file segment at VA 0x802e21000 with size 0x1000 to be written at offset 0x82000 for process zfsd
Dec 3 09:05:19 freenas GEOM_ELI: g_eli_read_done() failed (error=6) gptid/bda22312-c626-11e7-bce5-0cc47a12fd2c.eli[READ(offset=270336, length=8192)]
Dec 3 09:05:19 freenas GEOM_ELI: g_eli_read_done() failed (error=6) gptid/bda22312-c626-11e7-bce5-0cc47a12fd2c.eli[READ(offset=9998683086848, length=8192)]
Dec 3 09:05:19 freenas GEOM_ELI: g_eli_read_done() failed (error=6) gptid/bda22312-c626-11e7-bce5-0cc47a12fd2c.eli[READ(offset=9998683348992, length=8192)]
Dec 3 09:05:19 freenas GEOM_ELI: g_eli_read_done() failed (error=6) da0p1.eli[READ(offset=91074560, length=8192)]
Dec 3 09:05:19 freenas swap_pager: I/O error - pagein failed; blkno 546524,size 8192, error 6
Dec 3 09:05:19 freenas vm_fault: pager read error, pid 4168 (zfsd)
Dec 3 09:05:19 freenas kernel: Failed to fully fault in a core file segment at VA 0x803400000 with size 0x400000 to be written at offset 0x91000 for process zfsd
Dec 3 09:05:19 freenas kernel: Failed to fully fault in a core file segment at VA 0x803400000 with size 0x400000 to be written at offset 0x91000 for process zfsd
Dec 3 09:05:19 freenas kernel: pid 4168 (zfsd), uid 0: exited on signal 11 (core dumped)
Dec 3 09:05:36 freenas GEOM_ELI: g_eli_read_done() failed (error=6) da0p1.eli[READ(offset=96337920, length=12288)]
Dec 3 09:05:36 freenas swap_pager: I/O error - pagein failed; blkno 547809,size 12288, error 6
Dec 3 09:05:36 freenas vm_fault: pager read error, pid 3492 (python3.6)
Dec 3 09:05:36 freenas kernel: pid 3492 (python3.6), uid 0: exited on signal 11
Dec 3 09:32:18 freenas GEOM_ELI: g_eli_read_done() failed (error=6) da0p1.eli[READ(offset=95301632, length=4096)]
Dec 3 09:32:18 freenas swap_pager: I/O error - pagein failed; blkno 547556,size 4096, error 6
Dec 3 09:32:18 freenas vm_fault: pager read error, pid 3536 (consul)
Dec 3 09:32:18 freenas GEOM_ELI: g_eli_read_done() failed (error=6) da0p1.eli[READ(offset=95199232, length=32768)]
Dec 3 09:32:18 freenas swap_pager: I/O error - pagein failed; blkno 547531,size 32768, error 6
Dec 3 09:32:18 freenas vm_fault: pager read error, pid 4395 (consul)
Dec 3 09:32:18 freenas GEOM_ELI: g_eli_read_done() failed (error=6) da0p1.eli[READ(offset=58458112, length=12288)]
Dec 3 09:32:18 freenas swap_pager: I/O error - pagein failed; blkno 538561,size 12288, error 6
Dec 3 09:32:18 freenas vm_fault: pager read error, pid 3535 (daemon)
Dec 3 09:32:18 freenas GEOM_ELI: g_eli_read_done() failed (error=6) da0p1.eli[READ(offset=58466304, length=4096)]
Dec 3 09:32:18 freenas swap_pager: I/O error - pagein failed; blkno 538563,size 4096, error 6
Dec 3 09:32:18 freenas vm_fault: pager read error, pid 3535 (daemon)
Dec 3 09:32:18 freenas GEOM_ELI: g_eli_read_done() failed (error=6) da0p1.eli[READ(offset=58466304, length=4096)]
Dec 3 09:32:18 freenas swap_pager: I/O error - pagein failed; blkno 538563,size 4096, error 6
Dec 3 09:32:18 freenas vm_fault: pager read error, pid 3535 (daemon)
Dec 3 09:32:18 freenas GEOM_ELI: g_eli_read_done() failed (error=6) da0p1.eli[READ(offset=58466304, length=4096)]
Dec 3 09:32:18 freenas swap_pager: I/O error - pagein failed; blkno 538563,size 4096, error 6
Dec 3 09:32:18 freenas vm_fault: pager read error, pid 3535 (daemon)
Dec 3 09:32:18 freenas GEOM_ELI: g_eli_read_done() failed (error=6) da0p1.eli[READ(offset=94666752, length=32768)]
Dec 3 09:32:18 freenas swap_pager: I/O error - pagein failed; blkno 547401,size 32768, error 6
Dec 3 09:32:18 freenas vm_fault: pager read error, pid 4395 (consul)
Dec 3 09:32:18 freenas GEOM_ELI: g_eli_read_done() failed (error=6) da0p1.eli[READ(offset=85229568, length=8192)]
Dec 3 09:32:18 freenas swap_pager: I/O error - pagein failed; blkno 545097,size 8192, error 6
Dec 3 09:32:18 freenas vm_fault: pager read error, pid 3535 (daemon)
Dec 3 09:32:18 freenas kernel: Failed to fully fault in a core file segment at VA 0x800821000 with size 0x2000 to be written at offset 0x27000 for process daemon
Dec 3 09:32:18 freenas kernel: Failed to fully fault in a core file segment at VA 0x800821000 with size 0x2000 to be written at offset 0x27000 for process daemon
Dec 3 09:32:18 freenas GEOM_ELI: g_eli_read_done() failed (error=6) da0p1.eli[READ(offset=58494976, length=65536)]
Dec 3 09:32:18 freenas swap_pager: I/O error - pagein failed; blkno 538570,size 65536, error 6
Dec 3 09:32:18 freenas vm_fault: pager read error, pid 3539 (daemon)
Dec 3 09:32:18 freenas GEOM_ELI: g_eli_read_done() failed (error=6) da0p1.eli[READ(offset=58470400, length=24576)]
Dec 3 09:32:18 freenas swap_pager: I/O error - pagein failed; blkno 538564,size 24576, error 6
Dec 3 09:32:18 freenas vm_fault: pager read error, pid 3535 (daemon)
Dec 3 09:32:18 freenas kernel: Failed to fully fault in a core file segment at VA 0x800e00000 with size 0x400000 to be written at offset 0x4f000 for process daemon
Dec 3 09:32:18 freenas kernel: Failed to fully fault in a core file segment at VA 0x800e00000 with size 0x400000 to be written at offset 0x4f000 for process daemon
Dec 3 09:32:43 freenas GEOM_ELI: g_eli_read_done() failed (error=6) da0p1.eli[READ(offset=95264768, length=8192)]
Dec 3 09:32:43 freenas swap_pager: I/O error - pagein failed; blkno 547547,size 8192, error 6
Dec 3 09:32:43 freenas vm_fault: pager read error, pid 4396 (consul)


I notice that "da0p1" is shown a lot, however, my disks are:
jTco3cF.png
and what device is "asc:20,9"?

Edit: da0p2 is da0, so I will try change slot to see if that's the problem or take the drive offline to see if that's what failing.

However, how can ONE disk fail in raidz1 make the whole server freeze? Even the webgui? the os is on a USB
 
Last edited:

krazos

Dabbler
Joined
Nov 11, 2017
Messages
15
Changed da0 hard-drive slot, and no disk uses that slot now, but da4p2 failed now anyway, so it can't be the disks..
 
Joined
Jan 18, 2017
Messages
524
just out of curiosity what model number are your drives?
 
Last edited:
Joined
Jan 18, 2017
Messages
524
I'm concerned you maybe experiencing a similar problem to what @miip started a thread about but I don't know. The crashes I believe are caused when your swap is in use when one of the drives throws too many errors and fails out. There is a script on here by @Stux that can flush the swap and you can setup a cron job to have it run regularly. I'm still trying to learn the ins and out of FreeBSD/FreeNAS though so I could be way off lol
 

krazos

Dabbler
Joined
Nov 11, 2017
Messages
15
I'm concerned you maybe experiencing a similar problem to what @miip started a thread about but I don't know. The crashes I believe are caused when your swap is in use when one of the drives throws too many errors and fails out. There is a script on here by @Stux that can flush the swap and you can setup a cron job to have it run regularly. I'm still trying to learn the ins and out of FreeBSD/FreeNAS though so I could be way off lol

Yes that is the exact same type of errors I get! Interesting.. I will post in that thread as well.
 
Status
Not open for further replies.
Top