FreeNAS 11.1 MPS driver downgrade (for LSI 2008)

Status
Not open for further replies.

Xo*

Cadet
Joined
Mar 13, 2018
Messages
8
Hello,

I have troubles running FreeNAS 11.1 on HP DL180 G6 with LSI 2008 (M1015) installed in IT mode.
The FreeNAS is installed as ESXi VM with LSI passthrough.
There are two 4TB WD Black drives in mirror configured as a pool.

1. With the LSI firmware 20.00.07 I receive following errors.
Code:
(da1:mps0:0:9:0): READ(10). CDB: 28 00 10 b0 2d 30 00 01 00 00
(da1:mps0:0:9:0): CAM status: CCB request completed with an error
(da1:mps0:0:9:0): Retrying command
(da1:mps0:0:9:0): READ(10). CDB: 28 00 10 b0 31 30 00 01 00 00
(da1:mps0:0:9:0): CAM status: SCSI Status Error
(da1:mps0:0:9:0): SCSI status: Check Condition
(da1:mps0:0:9:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da1:mps0:0:9:0): Retrying command (per sense data)

After several times repeated the another error appears, after which the LSI adapter hangs up and whole ESXi needs to be restarted.
Code:
mps0: IOC Fault 0x40007e23, Resetting
mps0: Reinitializing controller,
mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
mps0: Calling Reinit from mps_wait_command, timeout=60, elapsed=61
mps0: Reinitializing controller,
mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
mps0: Calling Reinit from mps_wait_command, timeout=60, elapsed=61
mps0: Reinitializing controller,
mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>

2. The LSI firmware downgrade to 19.00.00 makes everything a little bit better (at least - without LSI hang up).
Code:
(da2:mps0:0:9:0): READ(16). CDB: 88 00 00 00 00 01 b2 5a 00 c0 00 00 01 00 00 00
(da2:mps0:0:9:0): CAM status: CCB request completed with an error
(da2:mps0:0:9:0): Retrying command
(da2:mps0:0:9:0): READ(16). CDB: 88 00 00 00 00 01 b2 5a 05 c0 00 00 01 00 00 00
(da2:mps0:0:9:0): CAM status: SCSI Status Error
(da2:mps0:0:9:0): SCSI status: Check Condition
(da2:mps0:0:9:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da2:mps0:0:9:0): Retrying command (per sense data)

This error occurs every couple of hours under heavy load (scrub plus couple of read/write threads).

3. Meanwhile I found, that with the FreeNAS 9.10-U6 and LSI 19.00.00 there are no any errors at all.

4. What was already performed with no effect.
- HDD SMART checked
- LSI heatsink replaced
- HP firmware upgraded
- PSU replaced to one from DL380 G6
- SFF-8087 cables replaced
- VM thin/thick provisioning

5. The next idea is to use FreeBSD drivers from FreeNAS 9.10 (version 21.01 if I'm correct) at FreeNAS 11.1.

Is it possible?
 
Last edited by a moderator:

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
how much RAM the VM has.
how many drives are in use. (looks like up to 8?)
how much RAM is in your HP DL180 G6 (a HP DL180 G6 can have up to 96 GB ram, for example, but that's not indicative of how much RAM *is* in it)
what PSU option you have ( I see 750W and 465W in quickspecs)
how many processors ( looks like this model can do 1 or 2)
basic hardware config is one of the forum post min requirements.

FWIW I had some similar issues. I fed the VM for more ram, and they stopped.
 

Xo*

Cadet
Joined
Mar 13, 2018
Messages
8
The VM has 16GB of RAM (with "Reserve all guest memory" option checked), 16GB Eager Zeroed drive for FreeNAS, 1 virtual socket with 2 cores per socket.
There are only two 4TB WD Black drives in a mirror pool, connected to LSI 2008 in IT mode.
DL180 G6: 48GB RAM installed, two 750W PSU, two X5650 CPU.

It seems to accomplish all the minimum requirements.

Which version of FreeNAS do you use? What is your LSI FW version?
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
updated my signature with specs
I believe the VM has those errors when it had 8GB ram

have you tried giving the VM more processor resources? when I added more RAM I had gone with a suggestion somewhere that lack of resrouces could cause sas errors

it seems more likely that you have a bad LSI card, than that the drivers that work for most don't work for you. is it a genuine branded card (IBM/DELL/etc)? ebay seems to have a high number of knockoff "LSI" cards (I believe I have 2 of them, they don't work at all anymore; they show up, but there are no drives) that say they are m1015 LSI 2008...but don't even have the right branding.

i forgot that main freenas used to have 2 dell perc 200 crossflashed to IT, but I was getting similar errors, mostly on reboot. something went very wrong with my pool , think i interrupted an operation, and on reboot it was taking years to import the pool. i got irritated and replaced the stupid things with the 9305...no more errors, and the pool imported in minutes....but this isn't a route many easily can take.

(I plan to un-virtualize the backup because it's too annoying; too much RAM not available for the whole purpose of a hypervisor - other VM's, doing anything with the esx requires disabling replications and shutting down the backup, not enough PCIe slots for screwing around with other passthru's on VMs, etc)
 

Xo*

Cadet
Joined
Mar 13, 2018
Messages
8
Last day tried with 24GB VM's RAM - the same errors appear.
Now set 8 vCPU as you suggested and launched scrub and couple of read/write threads. Will see...

The seller (local one with a good reputation) assured me that this card is branded IBM M1015 and was dismounted from IBM System x3100 M4 server. Visually all the surface-mount components placed accurately and soldered well.
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
that is unfortunate. it seems like there is a good chance the card itself is having issues. other than replacing the card, the only thing I can think of is maybe trying another PCIe slot.

incidentally, I have a card spitting out similar errors now. unsure if that's the sas card or sata onboard though. I don't seem to have much luck with the el cheapo options. out of 5, i have 2 duds, 2 i replaced due to insane pool import time, and now possibly this last one might be erring out on at least one port. :/
Code:
Mar 16 00:19:01 glpvmnas2	   (da7:mps0:0:11:0): WRITE(10). CDB: 2a 00 35 57 e3 88 00 00 40 00 length 32768 SMID 156 terminated ioc 804b loginfo 31120303 scsi 0 state c xfer 0
Mar 16 00:19:01 glpvmnas2	   (da7:mps0:0:11:0): WRITE(10). CDB: 2a 00 35 57 e2 88 00 01 00 00 length 131072 SMID 88 terminated ioc 804b loginfo 31120303 s(da7:mps0:0:11:0): WRITE(10). CDB: 2a 00 35 57 e3 88 00 00 40 00
Mar 16 00:19:01 glpvmnas2 csi 0 state c xfer 0
Mar 16 00:19:01 glpvmnas2 (da7:mps0:0:11:0): CAM status: CCB request completed with an error
Mar 16 00:19:01 glpvmnas2 (da7:mps0:0:11:0): Retrying command
Mar 16 00:19:01 glpvmnas2 (da7:mps0:0:11:0): WRITE(10). CDB: 2a 00 35 57 e2 88 00 01 00 00
Mar 16 00:19:01 glpvmnas2 (da7:mps0:0:11:0): CAM status: CCB request completed with an error
Mar 16 00:19:01 glpvmnas2 (da7:mps0:0:11:0): Retrying command
Mar 16 00:19:01 glpvmnas2 (da7:mps0:0:11:0): WRITE(10). CDB: 2a 00 35 57 e2 88 00 01 00 00
Mar 16 00:19:01 glpvmnas2 (da7:mps0:0:11:0): CAM status: SCSI Status Error
Mar 16 00:19:01 glpvmnas2 (da7:mps0:0:11:0): SCSI status: Check Condition
Mar 16 00:19:01 glpvmnas2 (da7:mps0:0:11:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Mar 16 00:19:01 glpvmnas2 (da7:mps0:0:11:0): Retrying command (per sense data)
Mar 16 00:19:01 glpvmnas2	   (da7:mps0:0:11:0): WRITE(10). CDB: 2a 00 35 57 e2 88 00 01 00 00 length 131072 SMID 151 terminated ioc 804b loginfo 31120303 scsi 0 state c xfer 0
Mar 16 00:19:01 glpvmnas2 (da7:mps0:0:11:0): WRITE(10). CDB: 2a 00 35 57 e2 88 00 01 00 00
Mar 16 00:19:01 glpvmnas2 (da7:mps0:0:11:0): CAM status: CCB request completed with an error
Mar 16 00:19:01 glpvmnas2 (da7:mps0:0:11:0): Retrying command
Mar 16 00:19:02 glpvmnas2 (da7:mps0:0:11:0): WRITE(10). CDB: 2a 00 35 57 e2 88 00 01 00 00
Mar 16 00:19:02 glpvmnas2 (da7:mps0:0:11:0): CAM status: SCSI Status Error
Mar 16 00:19:02 glpvmnas2 (da7:mps0:0:11:0): SCSI status: Check Condition
Mar 16 00:19:02 glpvmnas2 (da7:mps0:0:11:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
 

Xo*

Cadet
Joined
Mar 13, 2018
Messages
8
Already tried both PCIe slots.
Recently I was talking to the seller and he agreed provide me another IBM M1015 card for testing. The only issue is I have to return one of cards with the original MegaRAID firmware flashed back. Despite lots of topics and manuals how to crossflash M1015 to LSI 9211, I found no way to make M1015 from LSI so far :)
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
a little late but all you should need is the firmware for an m1015, the flash process ought to be the same. being able to crossflash at all wouldn't work if it was very different. would be curious to know what the results end up being, whether a new card solves the problem or if it persists
 

Xo*

Cadet
Joined
Mar 13, 2018
Messages
8
Unfortunately, the another card is not available right now - hope later it will be possible.
I've decided to leave well alone so far. Without periodical scrub it worked with no errors last week.
One of the options is to buy another manufacturer's LSI2008 chipset card (DELL, etc.) and use the M1015 at the backup NAS server.
 

Xo*

Cadet
Joined
Mar 13, 2018
Messages
8
Got the another M1015 card. Looks exactly as the first one, except 1 year manufacturing date difference.
Flashed it to P20.00.07 and got the same errors. Tested it at ESXi 6.0 and 5.5 with FreeNAS 11 RELEASE 24GB RAM.
 

wreedps

Patron
Joined
Jul 22, 2015
Messages
225
Ever fix this?
 

Xo*

Cadet
Joined
Mar 13, 2018
Messages
8
No. Just trying to avoid heavy load, especially intensive data write with scrubbing in parallel.
 
Status
Not open for further replies.
Top