SCSI Cam errors

Status
Not open for further replies.

AcFli

Cadet
Joined
Oct 1, 2018
Messages
2
Hi,

Using the 11.1-U4 release as a NFS filesystem for 3 ESXI hosts (2 x v5.5 and a v6.5) and ~30 VMs.

Lately my securitylog is exploding with errors like below and during the weekend there were a checksum error on one of the disks. Searching I found some possible solutions, bad cabling or connectivity problems for the esxi hosts or changing scsi controller type (all VMs are currently using LSI logic parallell).

Changing cables didn't change anything and I can't seem to notice any connectivity issues, is changing scsi controller type the next step or is it something else I should be doing?

TIA

Code:
kernel log messages:
> (da11:mpr1:0:3:0): READ(10). CDB: 28 00 36 a3 4b 90 00 01 00 00
> (da11:mpr1:0:3:0): CAM status: SCSI Status Error
> (da11:mpr1:0:3:0): SCSI status: Check Condition
> (da11:mpr1:0:3:0): SCSI sense: ABORTED COMMAND asc:4b,4 (NAK received)
> (da11:mpr1:0:3:0): Info: 0x36a34c8f
> (da11:mpr1:0:3:0): Retrying command (per sense data)
> (da11:mpr1:0:3:0): READ(10). CDB: 28 00 36 a3 55 18 00 01 00 00
> (da11:mpr1:0:3:0): CAM status: SCSI Status Error
> (da11:mpr1:0:3:0): SCSI status: Check Condition
> (da11:mpr1:0:3:0): SCSI sense: ABORTED COMMAND asc:4b,4 (NAK received)
> (da11:mpr1:0:3:0): Info: 0x36a355b1
> (da11:mpr1:0:3:0): Retrying command (per sense data)
> (da11:mpr1:0:3:0): READ(10). CDB: 28 00 36 a3 4b 90 00 01 00 00
> (da11:mpr1:0:3:0): CAM status: SCSI Status Error
> (da11:mpr1:0:3:0): SCSI status: Check Condition
> (da11:mpr1:0:3:0): SCSI sense: ABORTED COMMAND asc:4b,4 (NAK received)
> (da11:mpr1:0:3:0): Info: 0x36a34c27
> (da11:mpr1:0:3:0): Retrying command (per sense data)
> (da11:mpr1:0:3:0): READ(10). CDB: 28 00 36 a3 55 18 00 01 00 00
> (da11:mpr1:0:3:0): CAM status: SCSI Status Error
> (da11:mpr1:0:3:0): SCSI status: Check Condition
...
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
connectivity problems for the esxi hosts or changing scsi controller type (all VMs are currently using LSI logic parallell).
That does not work like that. the ESXi host acts as a middle man and will translate guest scsi/ata commands for the underlying storage layer or just modifies the files on the NFS share. The VMs storage is fully abstracted from the ESXi storage. Also for non OS disks, you should be using the pvscsi controller. It's a thinner layer and is typically faster (latency not throughput) with less CPU overhead.
https://kb.vmware.com/s/article/1010398
You have a bad disk/cable/controller. Try swapping two disks around and test. Also do you have smart tests setup?
 

AcFli

Cadet
Joined
Oct 1, 2018
Messages
2
Aha ok, was under the impression that too long delays between writes/reads to/from the ESXI hosts could have effect. Thanks.

Most if not all are OS disks, so in that case I should stick with LSI logic parallell?

A short SMART test is running once a week, so run it more frequently add a long test as well and try some disk swapping?
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Aha ok, was under the impression that too long delays between writes/reads to/from the ESXI hosts could have effect.
Yes it could but INSIDE the VM not on the backing storage.
Most if not all are OS disks, so in that case I should stick with LSI logic parallell?
If you select the correct OS type when the VM was setup, you should leave it as it was.
A short SMART test is running once a week, so run it more frequently add a long test as well and try some disk swapping?
I do a short test twice a week and a long test twice a month. The long test is the only one that TESTS the disk. The short basically just pulls the numbers. Yes try swapping da11 with another drive as see if the issues follow the drive or the bay/port/cable.
 
Status
Not open for further replies.
Top