remonv76
Dabbler
- Joined
- Dec 27, 2014
- Messages
- 49
I thought this was solved in Freenas 11.1, but it is not and this has nothing to do with SWAP.
First my config:
Dell R710 - 96GB RAM
2x 100GB Intel DC3500 (ZIL)
2x 240GB Intel DC S3510 (as cache and also holds swap partition)
Dell H200E crossflashed to LSI 9211-8i
Dell MD1220 Array connected as multipath to Dell R710
24x 300GB SAS Seagate
Array is configured with 3x 7 disk ZFS RAID2. The other 3 disks are used as cold spare.
iSCSI 2x 4x1Gbps interfaces (gonna upgrade soon to 2x10Gbps)
Hosts: VMWare iscsi 2x2x1Gbps round robin multipath
So I had a disk failure. It started with multipath disk 7, where one leg (da32) went offline and the other was active. I started the replacement of this disk. After 8 hours disk 7 totally failed. It resulted in a reboot of Freenas and data corruption on many Virtual Servers. Because I have a second freenas storage in replication (snapshots every 4 hours) I was able to recover most of the data.
But this should never happen! Who can help me figure out why I get data corruption and why freenas reboots? If you need logs, configs, let me know.
here is the last log
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): READ(10). CDB: 28 00 22 ec ac 90 00 00 10 00
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): CAM status: SCSI Status Error
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): SCSI status: Check Condition
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): Info: 0x22ecac90
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): Field Replaceable Unit: 129
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): Actual Retry Count: 107
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): Error 5, Unretryable error
Jan 21 19:38:20 san01 GEOM_MULTIPATH: Error 5, da32 in disk7 marked FAIL
Jan 21 19:38:20 san01 GEOM_MULTIPATH: all paths in disk7 were marked FAIL, restore da31
Jan 21 19:38:20 san01 GEOM_MULTIPATH: da31 is now active path in disk7
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): READ(10). CDB: 28 00 22 ec ae 90 00 00 10 00
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): CAM status: SCSI Status Error
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): SCSI status: Check Condition
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): Info: 0x22ecae90
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): Field Replaceable Unit: 129
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): Actual Retry Count: 107
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): Error 5, Unretryable error
Jan 21 19:38:22 san01 (da31:mps1:0:69:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
Jan 21 19:38:22 san01 (da31:mps1:0:69:0): CAM status: SCSI Status Error
Jan 21 19:38:22 san01 (da31:mps1:0:69:0): SCSI status: Check Condition
Jan 21 19:38:22 san01 (da31:mps1:0:69:0): SCSI sense: Deferred error: MEDIUM ERROR asc:c,3 (Write error - recommend reassignment)
Jan 21 19:38:22 san01 (da31:mps1:0:69:0): Info: 0x1b8f4830
Jan 21 19:38:22 san01 (da31:mps1:0:69:0): Retrying command (per sense data)
Jan 21 19:56:26 san01 syslog-ng[2236]: syslog-ng starting up; version='3.7.3'
Jan 21 19:56:26 san01 Copyright (c) 1992-2017 The FreeBSD Project.
Jan 21 19:56:26 san01 Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
Jan 21 19:56:26 san01 The Regents of the University of California. All rights reserved.
Jan 21 19:56:26 san01 FreeBSD is a registered trademark of The FreeBSD Foundation.
First my config:
Dell R710 - 96GB RAM
2x 100GB Intel DC3500 (ZIL)
2x 240GB Intel DC S3510 (as cache and also holds swap partition)
Dell H200E crossflashed to LSI 9211-8i
Dell MD1220 Array connected as multipath to Dell R710
24x 300GB SAS Seagate
Array is configured with 3x 7 disk ZFS RAID2. The other 3 disks are used as cold spare.
iSCSI 2x 4x1Gbps interfaces (gonna upgrade soon to 2x10Gbps)
Hosts: VMWare iscsi 2x2x1Gbps round robin multipath
So I had a disk failure. It started with multipath disk 7, where one leg (da32) went offline and the other was active. I started the replacement of this disk. After 8 hours disk 7 totally failed. It resulted in a reboot of Freenas and data corruption on many Virtual Servers. Because I have a second freenas storage in replication (snapshots every 4 hours) I was able to recover most of the data.
But this should never happen! Who can help me figure out why I get data corruption and why freenas reboots? If you need logs, configs, let me know.
here is the last log
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): READ(10). CDB: 28 00 22 ec ac 90 00 00 10 00
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): CAM status: SCSI Status Error
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): SCSI status: Check Condition
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): Info: 0x22ecac90
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): Field Replaceable Unit: 129
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): Actual Retry Count: 107
Jan 21 19:38:20 san01 (da32:mps1:0:68:0): Error 5, Unretryable error
Jan 21 19:38:20 san01 GEOM_MULTIPATH: Error 5, da32 in disk7 marked FAIL
Jan 21 19:38:20 san01 GEOM_MULTIPATH: all paths in disk7 were marked FAIL, restore da31
Jan 21 19:38:20 san01 GEOM_MULTIPATH: da31 is now active path in disk7
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): READ(10). CDB: 28 00 22 ec ae 90 00 00 10 00
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): CAM status: SCSI Status Error
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): SCSI status: Check Condition
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): Info: 0x22ecae90
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): Field Replaceable Unit: 129
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): Actual Retry Count: 107
Jan 21 19:38:22 san01 (da32:mps1:0:68:0): Error 5, Unretryable error
Jan 21 19:38:22 san01 (da31:mps1:0:69:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
Jan 21 19:38:22 san01 (da31:mps1:0:69:0): CAM status: SCSI Status Error
Jan 21 19:38:22 san01 (da31:mps1:0:69:0): SCSI status: Check Condition
Jan 21 19:38:22 san01 (da31:mps1:0:69:0): SCSI sense: Deferred error: MEDIUM ERROR asc:c,3 (Write error - recommend reassignment)
Jan 21 19:38:22 san01 (da31:mps1:0:69:0): Info: 0x1b8f4830
Jan 21 19:38:22 san01 (da31:mps1:0:69:0): Retrying command (per sense data)
Jan 21 19:56:26 san01 syslog-ng[2236]: syslog-ng starting up; version='3.7.3'
Jan 21 19:56:26 san01 Copyright (c) 1992-2017 The FreeBSD Project.
Jan 21 19:56:26 san01 Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
Jan 21 19:56:26 san01 The Regents of the University of California. All rights reserved.
Jan 21 19:56:26 san01 FreeBSD is a registered trademark of The FreeBSD Foundation.
Last edited: