SSD suddenly detached while running in production

blckhm

Dabbler
Joined
Sep 24, 2018
Messages
42
I remembered that when we tried to use SMR drives under heavy write loads. It looks like same issue but I got this in SSD drives. How can it be possible?

Here is the dmesg response:

Code:
nfsd: can't register svc name
    (da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 33 ad 4e 70 00 01 00 00 length 131072 SMID 214 terminated ioc 804b loginfo 31170000 scsi 0 state c xfer 0
(da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 33 ad 4e 70 00 01 00 00
(da12:mpr0:0:20:0): CAM status: CCB request completed with an error
(da12:mpr0:0:20:0): Retrying command
    (da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 33 ad 4e 70 00 01 00 00 length 131072 SMID 964 terminated ioc 804b loginfo 31170000 scsi 0 state 0 xfer 0
(da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 33 ad 4e 70 00 01 00 00
(da12:mpr0:0:20:0): CAM status: CCB request completed with an error
(da12:mpr0:0:20:0): Retrying command
(da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 33 ad 4e 70 00 01 00 00
(da12:mpr0:0:20:0): CAM status: SCSI Status Error
(da12:mpr0:0:20:0): SCSI status: Check Condition
(da12:mpr0:0:20:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da12:mpr0:0:20:0): Retrying command (per sense data)
pid 2595 (winacl), jid 0, uid 0: exited on signal 11
    (da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 03 42 32 f8 00 00 88 00 length 69632 SMID 564 Aborting command 0xfffffe000119fac0
mpr0: Sending reset from mprsas_send_abort for target ID 20
    (da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 00 e5 1e 50 00 00 18 00 length 12288 SMID 569 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0
    (da12:mpr0:0:20:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 length 512 SMID 337 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0
    (da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 03 42 31 f8 00 01 00 00 length 131072 SMID 327 terminated ioc 804b loginfo 31140000(da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 00 e5 1e 50 00 00 18 00
 scsi 0 state c xfer 0
    (da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 00 e5 1d 78 00 00 08 00 length 4096 SMID 983 terminated ioc 804b loginfo 31140000 s(da12:mpr0:0:20:0): CAM status: CCB request completed with an error
(da12:mpr0:0:20:0): Retrying command
csi 0 state c xfer 0
    (da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 00 e5 1e 08 00 00 20 00 length 16384 SMID 935 terminated ioc 804b loginfo 31140000 (da12:mpr0:0:20:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00
scsi 0 state c xfer 0
    (da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 00 e5 1d c0 00 00 10 00 length 8192 SMID 165 terminated ioc 804b loginfo 31140000 s(da12:mpr0:0:20:0): CAM status: CCB request completed with an error
(da12:mpr0:0:20:0): Retrying command
csi 0 state c xfer 0
mpr0: Unfreezing devq for target ID 20
(da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 03 42 31 f8 00 01 00 00
(da12:mpr0:0:20:0): CAM status: CCB request completed with an error
(da12:mpr0:0:20:0): Retrying command
(da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 03 42 32 f8 00 00 88 00
(da12:mpr0:0:20:0): CAM status: Command timeout
(da12:mpr0:0:20:0): Retrying command
(da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 00 e5 1d 78 00 00 08 00
(da12:mpr0:0:20:0): CAM status: CCB request completed with an error
(da12:mpr0:0:20:0): Retrying command
(da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 00 e5 1e 08 00 00 20 00
(da12:mpr0:0:20:0): CAM status: CCB request completed with an error
(da12:mpr0:0:20:0): Retrying command
(da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 00 e5 1d c0 00 00 10 00
(da12:mpr0:0:20:0): CAM status: CCB request completed with an error
(da12:mpr0:0:20:0): Retrying command
mpr0: mprsas_prepare_remove: Sending reset for target ID 20
da12 at mpr0 bus 0 scbus32 target 20 lun 0
da12: <ATA MTFDDAK480TDN F003> s/n 200426191BAF detached
    (da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 00 e5 1d c0 00 00 10 00 length 8192 SMID 732 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0
    (da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 00 e5 1e 50 00 00 18 00 length 12288 SMID 446 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0
(da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 00 e5 1d c0 00 00 10 00
    (da12:mpr0:0:20:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 length 512 SMID 1132 t(da12:mpr0:0:20:0): CAM status: CCB request completed with an error
(da12:mpr0:0:20:0): Error 5, Periph was invalidated
erminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0
    (da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 00 e5 1e 08 00 00 20 00 length 16384 SMID 288 terminated ioc 804b loginfo 31130000 (da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 00 e5 1e 50 00 00 18 00
scsi 0 state c xfer 0
    (da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 03 42 32 f8 00 00 88 00 length 69632 SMID 142 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0
    (da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 00 e5 1d 78 00 00 08 00 length 4096 SMID 481 terminated ioc 804b loginfo 31130000 s(da12:mpr0:0:20:0): CAM status: CCB request completed with an error
(da12:mpr0:0:20:0): Error 5, Periph was invalidated
(da12:mpr0:0:20:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00
csi 0 state c xfer 0
(da12:mpr0:0:20:0): CAM status: CCB request completed with an error
(da12:mpr0:0:20:0): Error 5, Periph was invalidated
    (da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 03 42 31 f8 00 01 00 00 length 131072 SMID 565 terminated ioc 804b loginfo 31130000(da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 00 e5 1e 08 00 00 20 00
 scsi 0 state c xfer 0
mpr0: clearing target 20 handle 0x0016
mpr0: At enclosure level 1, slot 12, connector name (    )
mpr0: (da12:mpr0:0:20:0): CAM status: CCB request completed with an error
Unfreezing devq for target ID 20
(da12:mpr0:0:20:0): Error 5, Periph was invalidated
(da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 03 42 32 f8 00 00 88 00
(da12:mpr0:0:20:0): CAM status: CCB request completed with an error
(da12:mpr0:0:20:0): Error 5, Periph was invalidated
(da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 00 e5 1d 78 00 00 08 00
(da12:mpr0:0:20:0): CAM status: CCB request completed with an error
(da12:mpr0:0:20:0): Error 5, Periph was invalidated
(da12:mpr0:0:20:0): WRITE(10). CDB: 2a 00 03 42 31 f8 00 01 00 00
(da12:mpr0:0:20:0): CAM status: CCB request completed with an error
(da12:mpr0:0:20:0): Error 5, Periph was invalidated
(da12:mpr0:0:20:0): Periph destroyed
mpr0: SAS Address for SATA device = 44181d0e68778c83
mpr0: SAS Address from SAS device page0 = 500056b30d9a13cc
mpr0: SAS Address from SATA device = 44181d0e68778c83
mpr0: Found device <881<SataDev,Direct>,End Device> <12.0Gbps> handle<0x0016> enclosureHandle<0x0002> slot 12
mpr0: At enclosure level 1 and connector name (    )
ses0: da12,pass13 in 'Drive Slot 12', SAS Slot: 2 phys at slot 12
ses0:  phy 0: SATA device
ses0:  phy 0: parent 500056b30d9a13ff addr 500056b30d9a13cc
ses0:  phy 1: SAS device type 0 phy 0
ses0:  phy 1: parent 0 addr 0
da12 at mpr0 bus 0 scbus32 target 20 lun 0
da12: <ATA MTFDDAK480TDN F003> Fixed Direct Access SPC-4 SCSI device
da12: Serial Number 200426191BAF
da12: 1200.000MB/s transfers
da12: Command Queueing enabled
da12: 457862MB (937703088 512 byte sectors)
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I think there was a bug in 11.3 which was worked around in 11.3-U2... maybe you're impacted by that?

You haven't given a good amount of information about your system for any further help.
 

blckhm

Dabbler
Joined
Sep 24, 2018
Messages
42
oh crap, so how can i revert my freenas version to 11.2 ?

Version: FreeNAS-11.3-U2
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
how can i revert my freenas version to 11.2 ?
Usually going to System | Boot and activating the previous version is the best way. (assuming you didn't remove the previous boot environment)
 

blckhm

Dabbler
Joined
Sep 24, 2018
Messages
42
Great! but I have another problem now. I've been upgraded my pools when I switched from 11.2. Now If I activate back to the boot 11.2-U8, what happens on my pools?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
You won't be able to mount them in read-write mode. (read-only will be possible, but I guess no good for you).

Looks like you're stuck working through the problem.
 

blckhm

Dabbler
Joined
Sep 24, 2018
Messages
42
It is not acceptable solution for now. So we tried to go on with that issue until it will resolve in next update.
 

blckhm

Dabbler
Joined
Sep 24, 2018
Messages
42
Btw, it dropped our ssd again last night :mad:

But this time, os tried to read temp. from drive at 03:20:04 and after timeout, os sends mprsas_prepare_remove command at 03:20:12
After all chaos enviroment, I did only zpool clear tank4 command and after that everything looks fine :D
Scrub looks good.
smart data looks good.


Here is the log file of mesages;

Code:
Apr 17 00:00:00 freenas2 syslog-ng[1045]: Configuration reload finished;
Apr 17 03:20:04 freenas2 collectd[1782]: Traceback (most recent call last):
  File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 61, in read
    temperatures = c.call('disk.temperatures', self.disks, self.powermode, self.smartctl_args)
  File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 61, in read
    temperatures = c.call('disk.temperatures', self.disks, self.powermode, self.smartctl_args)
  File "/usr/local/lib/python3.7/site-packages/middlewared/client/client.py", line 386, in call
    raise CallTimeout("Call timeout")
middlewared.client.client.CallTimeout: Call timeout
Apr 17 03:20:04 freenas2     (pass13:mpr0:0:20:0): LOG SENSE. CDB: 4d 00 4d 00 00 00 00 00 40 00 length 64 SMID 848 Aborting command 0xfffffe00011c1300
Apr 17 03:20:04 freenas2 mpr0: Sending reset from mprsas_send_abort for target ID 20
Apr 17 03:20:08 freenas2 mpr0: Unfreezing devq for target ID 20
Apr 17 03:20:12 freenas2 mpr0: mprsas_prepare_remove: Sending reset for target ID 20
Apr 17 03:20:12 freenas2 da12 at mpr0 bus 0 scbus32 target 20 lun 0
Apr 17 03:20:12 freenas2 da12: <ATA MTFDDAK480TDN F003> s/n 200426191BAF detached
Apr 17 03:20:12 freenas2 GEOM_MIRROR: Device swap0: provider da12p1 disconnected.
Apr 17 03:20:12 freenas2 ZFS: vdev state changed, pool_guid=3615194417707962538 vdev_guid=2330578963568157548
Apr 17 03:20:12 freenas2 ZFS: vdev is removed, pool_guid=3615194417707962538 vdev_guid=2330578963568157548
Apr 17 03:20:13 freenas2     (pass13:mpr0:0:20:0): LOG SENSE. CDB: 4d 00 6f 00 00 00 00 00 40 00 length 64 SMID 653 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0
Apr 17 03:20:13 freenas2 mpr0: clearing target 20 handle 0x0016
Apr 17 03:20:13 freenas2 mpr0: At enclosure level 1, slot 12, connector name (    )
Apr 17 03:20:13 freenas2 mpr0: Unfreezing devq for target ID 20
Apr 17 03:20:13 freenas2 (da12:mpr0:0:20:0): Periph destroyed
 
Top