SCSI Cam errors

AcFli · Oct 1, 2018

Hi,

Using the 11.1-U4 release as a NFS filesystem for 3 ESXI hosts (2 x v5.5 and a v6.5) and ~30 VMs.

Lately my securitylog is exploding with errors like below and during the weekend there were a checksum error on one of the disks. Searching I found some possible solutions, bad cabling or connectivity problems for the esxi hosts or changing scsi controller type (all VMs are currently using LSI logic parallell).

Changing cables didn't change anything and I can't seem to notice any connectivity issues, is changing scsi controller type the next step or is it something else I should be doing?

TIA

Code:

kernel log messages:
> (da11:mpr1:0:3:0): READ(10). CDB: 28 00 36 a3 4b 90 00 01 00 00
> (da11:mpr1:0:3:0): CAM status: SCSI Status Error
> (da11:mpr1:0:3:0): SCSI status: Check Condition
> (da11:mpr1:0:3:0): SCSI sense: ABORTED COMMAND asc:4b,4 (NAK received)
> (da11:mpr1:0:3:0): Info: 0x36a34c8f
> (da11:mpr1:0:3:0): Retrying command (per sense data)
> (da11:mpr1:0:3:0): READ(10). CDB: 28 00 36 a3 55 18 00 01 00 00
> (da11:mpr1:0:3:0): CAM status: SCSI Status Error
> (da11:mpr1:0:3:0): SCSI status: Check Condition
> (da11:mpr1:0:3:0): SCSI sense: ABORTED COMMAND asc:4b,4 (NAK received)
> (da11:mpr1:0:3:0): Info: 0x36a355b1
> (da11:mpr1:0:3:0): Retrying command (per sense data)
> (da11:mpr1:0:3:0): READ(10). CDB: 28 00 36 a3 4b 90 00 01 00 00
> (da11:mpr1:0:3:0): CAM status: SCSI Status Error
> (da11:mpr1:0:3:0): SCSI status: Check Condition
> (da11:mpr1:0:3:0): SCSI sense: ABORTED COMMAND asc:4b,4 (NAK received)
> (da11:mpr1:0:3:0): Info: 0x36a34c27
> (da11:mpr1:0:3:0): Retrying command (per sense data)
> (da11:mpr1:0:3:0): READ(10). CDB: 28 00 36 a3 55 18 00 01 00 00
> (da11:mpr1:0:3:0): CAM status: SCSI Status Error
> (da11:mpr1:0:3:0): SCSI status: Check Condition
...

kdragon75 · Oct 1, 2018

AcFli said:
connectivity problems for the esxi hosts or changing scsi controller type (all VMs are currently using LSI logic parallell).

That does not work like that. the ESXi host acts as a middle man and will translate guest scsi/ata commands for the underlying storage layer or just modifies the files on the NFS share. The VMs storage is fully abstracted from the ESXi storage. Also for non OS disks, you should be using the pvscsi controller. It's a thinner layer and is typically faster (latency not throughput) with less CPU overhead.
https://kb.vmware.com/s/article/1010398
You have a bad disk/cable/controller. Try swapping two disks around and test. Also do you have smart tests setup?

AcFli · Oct 1, 2018

Aha ok, was under the impression that too long delays between writes/reads to/from the ESXI hosts could have effect. Thanks.

Most if not all are OS disks, so in that case I should stick with LSI logic parallell?

A short SMART test is running once a week, so run it more frequently add a long test as well and try some disk swapping?

kdragon75 · Oct 1, 2018

AcFli said:
Aha ok, was under the impression that too long delays between writes/reads to/from the ESXI hosts could have effect.

Yes it could but INSIDE the VM not on the backing storage.

AcFli said:
Most if not all are OS disks, so in that case I should stick with LSI logic parallell?

If you select the correct OS type when the VM was setup, you should leave it as it was.

AcFli said:
A short SMART test is running once a week, so run it more frequently add a long test as well and try some disk swapping?

I do a short test twice a week and a long test twice a month. The long test is the only one that TESTS the disk. The short basically just pulls the numbers. Yes try swapping da11 with another drive as see if the issues follow the drive or the bay/port/cable.

Important Announcement for the TrueNAS Community.

SCSI Cam errors

AcFli

Cadet

kdragon75

Wizard

AcFli

Cadet

kdragon75

Wizard

Similar threads

Important Announcement for the TrueNAS Community.

SCSI Cam errors

AcFli

Cadet

kdragon75

Wizard

AcFli

Cadet

kdragon75

Wizard

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "SCSI Cam errors"

Similar threads