Freenas 11.1 - Disk failure -> FN reboots

remonv76 · Jan 22, 2019

I thought this was solved in Freenas 11.1, but it is not and this has nothing to do with SWAP.

First my config:
Dell R710 - 96GB RAM
2x 100GB Intel DC3500 (ZIL)
2x 240GB Intel DC S3510 (as cache and also holds swap partition)
Dell H200E crossflashed to LSI 9211-8i
Dell MD1220 Array connected as multipath to Dell R710
24x 300GB SAS Seagate
Array is configured with 3x 7 disk ZFS RAID2. The other 3 disks are used as cold spare.
iSCSI 2x 4x1Gbps interfaces (gonna upgrade soon to 2x10Gbps)
Hosts: VMWare iscsi 2x2x1Gbps round robin multipath

So I had a disk failure. It started with multipath disk 7, where one leg (da32) went offline and the other was active. I started the replacement of this disk. After 8 hours disk 7 totally failed. It resulted in a reboot of Freenas and data corruption on many Virtual Servers. Because I have a second freenas storage in replication (snapshots every 4 hours) I was able to recover most of the data.

But this should never happen! Who can help me figure out why I get data corruption and why freenas reboots? If you need logs, configs, let me know.

here is the last log
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): READ(10). CDB: 28 00 22 ec ac 90 00 00 10 00
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): CAM status: SCSI Status Error
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): SCSI status: Check Condition
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): Info: 0x22ecac90
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): Field Replaceable Unit: 129
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): Actual Retry Count: 107
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): Error 5, Unretryable error
Jan 21 19:38:20 san01 GEOM_MULTIPATH: Error 5, da32 in disk7 marked FAIL
Jan 21 19:38:20 san01 GEOM_MULTIPATH: all paths in disk7 were marked FAIL, restore da31
Jan 21 19:38:20 san01 GEOM_MULTIPATH: da31 is now active path in disk7
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): READ(10). CDB: 28 00 22 ec ae 90 00 00 10 00
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): CAM status: SCSI Status Error
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): SCSI status: Check Condition
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): Info: 0x22ecae90
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): Field Replaceable Unit: 129
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): Actual Retry Count: 107
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): Error 5, Unretryable error
Jan 21 19:38:22 san01 (da31:mps1:0:69:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
Jan 21 19:38:22 san01 (da31:mps1:0:69:0): CAM status: SCSI Status Error
Jan 21 19:38:22 san01 (da31:mps1:0:69:0): SCSI status: Check Condition
Jan 21 19:38:22 san01 (da31:mps1:0:69:0): SCSI sense: Deferred error: MEDIUM ERROR asc:c,3 (Write error - recommend reassignment)
Jan 21 19:38:22 san01 (da31:mps1:0:69:0): Info: 0x1b8f4830
Jan 21 19:38:22 san01 (da31:mps1:0:69:0): Retrying command (per sense data)
Jan 21 19:56:26 san01 syslog-ng[2236]: syslog-ng starting up; version='3.7.3'
Jan 21 19:56:26 san01 Copyright (c) 1992-2017 The FreeBSD Project.
Jan 21 19:56:26 san01 Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
Jan 21 19:56:26 san01 The Regents of the University of California. All rights reserved.
Jan 21 19:56:26 san01 FreeBSD is a registered trademark of The FreeBSD Foundation.

Chris Moore · Jan 22, 2019

remonv76 said:
After 8 hours disk 7 totally failed. It resulted in a reboot of Freenas

Why did FreeNAS reboot? A simple disk failure should not have caused that. I have ripped live disks out and FreeNAS doesn't miss a beat unless it is using that disk for swap space. I think your configuration might need some scrutiny.

remonv76 said:
and this has nothing to do with SWAP.

What makes you say that?

remonv76 said:
I thought this was solved in Freenas 11.1

Exactly what software version are you using?

remonv76 · Jan 23, 2019

SWAP is on the 2 intel SSD’s and they work fine. There is no swap partition on the 24x disks. So it could not be swap. I knew about the swap issue in 9.x

I have absolutely no other software running, besides freenas 11.1 on the storage, vmware on the hosts and centos/redhat on the virtual servers. So i fi;d it very strange that freenas crashed. I run it of a usb stick in mirror with another usb stick.

remonv76 · Jan 23, 2019

I use Freenas version 11.1-U1 STABLE

Also, when i get an alert that a disk may fail, should i take it offline and then replace it or is it better to just replace it in line? The last option is probably faster, because it just copies the data of the disk, rather then rebuilding the array.
On the other hand, taking it offline is probably saver. Because i did not do that, it is most likely the reasen why data corruption occured when the disk totally failed. But i'm scared that Freenas will crash when i do this with 30 VPS servers running in vmware.

Important Announcement for the TrueNAS Community.

Freenas 11.1 - Disk failure -> FN reboots

remonv76

Dabbler

Chris Moore

Hall of Famer

remonv76

Dabbler

remonv76

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Freenas 11.1 - Disk failure -> FN reboots

remonv76

Dabbler

Chris Moore

Hall of Famer

remonv76

Dabbler

remonv76

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Freenas 11.1 - Disk failure -> FN reboots"

Similar threads