SOLVED mps0: mpssas_prepare_remove: Sending reset for target ID 15 CAM status: CCB request aborted by the host

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
Well that's quite likely sufficient, it tends to be more of a problem in tower cases. It's really the only other thing I could think of that would cause strange behaviours. Beyond that you move into the realm of things like undersized PSU's, bad connectors, etc.

I don't use Tower Cases. and the Power Supply im using is a Seasonic SS-600H2U. Server only pulls around 100watts according to my UPS in the rack. https://seasonic.com/h2u
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
How many drives? That looked like a 12 drive backplane, and 12 drives on a 600W PSU comes up a little short. "Server only pulls around 100 watts" is nearly irrelevant in the grand scheme of things, because it's stuff like spinup current, wire thickness, whether there are multiple 12V rails, etc., that can get you. That backplane looks to only have three "Molex" connectors, suggesting each one is supporting four drives. Since spinup current can be as much as three amps, that could mean that during spinup, you're trying to feed somewhere between 8-12A through a Molex pin. A legitimate actual Molex high quality pin is only rated for 11A, but that could easily be less because 18 gauge wire commonly used on lower wattage PSU assemblies is rated 10A max, and lower quality "generic" Molex pins often start to cook around 4-6 amps because they don't make great contact with the mating socket.

This leads to brownouts during spinup, which can permanently damage drives, and can lead to erratic operation. I'm specifically talking about this because that seems to correspond with what you have observed, you have a consistent problem that remains through cable and controller changes. Are the problematic drives on the far side of the board from the PSU connectors? Just a guess.

We have a great thread that explains what's involved in proper power supply sizing, and why yours is probably a bit too small. That by itself isn't likely to be the issue, but it *could* be.

It's also possible that there are no smoothing capacitors on your backplane, which tends to make any PSU issues more obvious.
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
How many drives? That looked like a 12 drive backplane, and 12 drives on a 600W PSU comes up a little short. "Server only pulls around 100 watts" is nearly irrelevant in the grand scheme of things, because it's stuff like spinup current, wire thickness, whether there are multiple 12V rails, etc., that can get you. That backplane looks to only have three "Molex" connectors, suggesting each one is supporting four drives. Since spinup current can be as much as three amps, that could mean that during spinup, you're trying to feed somewhere between 8-12A through a Molex pin. A legitimate actual Molex high quality pin is only rated for 11A, but that could easily be less because 18 gauge wire commonly used on lower wattage PSU assemblies is rated 10A max, and lower quality "generic" Molex pins often start to cook around 4-6 amps because they don't make great contact with the mating socket.

This leads to brownouts during spinup, which can permanently damage drives, and can lead to erratic operation. I'm specifically talking about this because that seems to correspond with what you have observed, you have a consistent problem that remains through cable and controller changes. Are the problematic drives on the far side of the board from the PSU connectors? Just a guess.

We have a great thread that explains what's involved in proper power supply sizing, and why yours is probably a bit too small. That by itself isn't likely to be the issue, but it *could* be.

It's also possible that there are no smoothing capacitors on your backplane, which tends to make any PSU issues more obvious.


I ran 12 disks on a 550watt Corsair Power supply for the past 4 years without any issues. Issue seems to be there when using these RAID Cards. I know people that are running 12 Disk Server with 460watt Power Supplies. The E3 hardware doesnt really use much as i say 100watts according to my UPS and 100watts with that killowatt meter i have.

I still believe this is a bug in FreeNAS tho.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Yeah, okay, fine, I don't have any clue what I'm talking about, and your counterexample proves that this couldn't possibly be a problem.

Wonder if I can get a refund on those electrical engineering courses.
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
Yeah, okay, fine, I don't have any clue what I'm talking about, and your counterexample proves that this couldn't possibly be a problem.

Wonder if I can get a refund on those electrical engineering courses.

Then why does my UPS show its using 100watts? That kilowatt shows the same. If the PSU couldn't handle the system then why doesn't the power supply go into over volt protect?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Read the linked article. Average usage isn't the issue. It's peak usage, especially during spinup.

If you normally drive you car around at 40MPH but you race it up to 130MPH into the red zone for five minutes every day, the mechanic who eventually has to fix your ruined engine isn't going to be interested in the fact that you "mostly" ran it at 40MPH. It's not the 40MPH that's the problem.

Relying on "over volt protect" (which isn't relevant here anyways) to keep you safe is foolish; the correct way to engineer a system is to run the numbers and build in the appropriate safety margins, because while parts may be rated to work at certain voltages and capacities, actually doing so on a continuous basis has a much higher chance of failure than a properly designed system with proper overhead.
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
Read the linked article. Average usage isn't the issue. It's peak usage, especially during spinup.

If you normally drive you car around at 40MPH but you race it up to 130MPH into the red zone for five minutes every day, the mechanic who eventually has to fix your ruined engine isn't going to be interested in the fact that you "mostly" ran it at 40MPH. It's not the 40MPH that's the problem.

Relying on "over volt protect" (which isn't relevant here anyways) to keep you safe is foolish; the correct way to engineer a system is to run the numbers and build in the appropriate safety margins, because while parts may be rated to work at certain voltages and capacities, actually doing so on a continuous basis has a much higher chance of failure than a properly designed system with proper overhead.

Issue is. I can't find a 2U Power Supply that is more than 600watt. So dunno what to do.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Issue is. I can't find a 2U Power Supply that is more than 600watt. So dunno what to do.

Srsly? Found several at the first distributor I looked at.

https://www.rackmountnet.com/

Click on EMACS

You'll find https://www.rackmountnet.com/product/emacs-2u-1200w-power-supply/ a 1200W PSU for 2U, plus a 760W and some others. I'm sure Sparkle has stuff too.

Now, I'm not encouraging you to run right out and buy a new PSU. There are enough other things that you could be, and should be, checking that have been mentioned above, that trying a different PSU is probably on the third tier of things I'd try, and that would probably be after hooking up a second PSU to take some of the load off the existing supply to see if that helped. Non-rackmount PSU's are cheap and easy to come by.

What's the wire gauge being used to hook up the backplane? What's the 12V voltage at the end of the backplane, is it sagging? There's lots of things that could be impacting this.

The simple fact is that this sounds like a hardware problem of some sort, because there are many millions of aggregate problem-free run-hours on the LSI HBA's. Everyone uses them.
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
Srsly? Found several at the first distributor I looked at.

https://www.rackmountnet.com/

Click on EMACS

You'll find https://www.rackmountnet.com/product/emacs-2u-1200w-power-supply/ a 1200W PSU for 2U, plus a 760W and some others. I'm sure Sparkle has stuff too.

Now, I'm not encouraging you to run right out and buy a new PSU. There are enough other things that you could be, and should be, checking that have been mentioned above, that trying a different PSU is probably on the third tier of things I'd try, and that would probably be after hooking up a second PSU to take some of the load off the existing supply to see if that helped. Non-rackmount PSU's are cheap and easy to come by.

What's the wire gauge being used to hook up the backplane? What's the 12V voltage at the end of the backplane, is it sagging? There's lots of things that could be impacting this.

The simple fact is that this sounds like a hardware problem of some sort, because there are many millions of aggregate problem-free run-hours on the LSI HBA's. Everyone uses them.

No stock in the UK which is the problem. I've had the same issue on other hardware. I've tested with other boards and power supplies. My workstation has a 750watt same problem occurs. I just find it funny how this only happens when I use these SAS Controllers. When using onboard SATA I get no issues and other thing is why doesn't ubuntu server report this issue under dmesg?

Not sure about the AWG of the cable. Backplane is not sagging.
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
Srsly? Found several at the first distributor I looked at.

https://www.rackmountnet.com/

Click on EMACS

You'll find https://www.rackmountnet.com/product/emacs-2u-1200w-power-supply/ a 1200W PSU for 2U, plus a 760W and some others. I'm sure Sparkle has stuff too.

Now, I'm not encouraging you to run right out and buy a new PSU. There are enough other things that you could be, and should be, checking that have been mentioned above, that trying a different PSU is probably on the third tier of things I'd try, and that would probably be after hooking up a second PSU to take some of the load off the existing supply to see if that helped. Non-rackmount PSU's are cheap and easy to come by.

What's the wire gauge being used to hook up the backplane? What's the 12V voltage at the end of the backplane, is it sagging? There's lots of things that could be impacting this.

The simple fact is that this sounds like a hardware problem of some sort, because there are many millions of aggregate problem-free run-hours on the LSI HBA's. Everyone uses them.

Just a quick update,

ive added a fan to the SAS Controller ive also re-seated the controller same with the SAS cables on both ends the controller and backplane. Ive also swapped DA11 and DA10 too see if errors are reporting on DA10 if so that will indicate a drive issue if DA11 gives me errors where DA10 was then that either means bad Backplane, Cable or Controller i suppose.
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
Just a quick update, performance has improved since adding the FAN to the SAS Controller, re-seating the Cables and Card.
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
Its strange going through the boot log i now have this which i didnt on the boot which i was having this SCSI Sense Errors. These were missing out of the last boot.

Code:
GEOM_RAID5: Module loaded, version 1.3.20140711.62 (rev f91e28e40bf7)
GEOM_MIRROR: Device mirror/swap0 launched (2/2).
GEOM_MIRROR: Device mirror/swap1 launched (2/2).
GEOM_MIRROR: Device mirror/swap2 launched (2/2).
GEOM_MIRROR: Device mirror/swap3 launched (2/2).
GEOM_MIRROR: Device mirror/swap4 launched (2/2).
GEOM_ELI: Device mirror/swap0.eli created.
GEOM_ELI: Encryption: AES-XTS 128
GEOM_ELI:     Crypto: hardware
GEOM_ELI: Device mirror/swap1.eli created.
GEOM_ELI: Encryption: AES-XTS 128
GEOM_ELI:     Crypto: hardware
GEOM_ELI: Device mirror/swap2.eli created.
GEOM_ELI: Encryption: AES-XTS 128
GEOM_ELI:     Crypto: hardware
GEOM_ELI: Device mirror/swap3.eli created.
GEOM_ELI: Encryption: AES-XTS 128
GEOM_ELI:     Crypto: hardware
GEOM_ELI: Device mirror/swap4.eli created.
GEOM_ELI: Encryption: AES-XTS 128
GEOM_ELI:     Crypto: hardware
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
So after 22 hours the Error has not appeared but will keep an eye on, ive also done more research in this camcontrol and its not on Linux but there is a package you can install on Linux which is why Linux does not report the CAM Error.

I will keep an eye out on Power Supplies i was looking at a redundant solution as i run iSCSI for my VMs.

Thanks.
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
I think its the Drive. I swapped it into another bay and its complaining about the same drive in a different bay.

Code:
mps0: mpssas_prepare_remove: Sending reset for target ID 15
da10 at mps0 bus 0 scbus0 target 15 lun 0
mps0: da10: <ATA ST1000DM010-2EP1 CC43>Unfreezing devq for target ID 15
 s/n Z9AR1MFP detached
(da10:mps0:0:15:0): Periph destroyed
mps0: SAS Address for SATA device = 3c2f56516485484e
mps0: SAS Address from SATA device = 3c2f56516485484e
da10 at mps0 bus 0 scbus0 target 15 lun 0
da10: <ATA ST1000DM010-2EP1 CC43> Fixed Direct Access SPC-4 SCSI device
da10: Serial Number Z9AR1MFP
da10: 600.000MB/s transfers
da10: Command Queueing enabled
da10: 953869MB (1953525168 512 byte sectors)
da10: quirks=0x8<4K>
ses0: da10,pass11: Element descriptor: 'ArrayDevice05'
ses0: da10,pass11: SAS Device Slot Element: 1 Phys at Slot 5
ses0:  phy 0: SATA device
ses0:  phy 0: parent 500605b0000274bf addr 500605b0000274a5
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
I have offlined the drive and problem is not there when using onboard SATA. So its either these SAS Controllers or FreeNAS.
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
looks like im going to be looking for another alternative because ive suffered with this issue for so long now. looks to be a common problem so common most of them has not been solved and blaming it on hardware more than software. i just dont get why it only happens on these SAS controllers yet these errors dont show on onboard SATA.

its either a software incompatibility issue or a firmware problem on these Dell Perc H310 controller as its the second card and both suffers from the same problem in two different machines. i have RMA'd 3 3TB drives because all those drives suffered from the same issue now these 1TB drives are showing the same issue. i give up on this!

So are you going to say 4 drives are bad? urmmm!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Speaking as someone who does this stuff professionally, I find that most problems can be resolved but it often requires some time and effort, trial and error, to determine what's wrong. It's all very much harder when I have to work through your eyes and ears to proxy things through, and I don't actually get a chance to see your setup directly.
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
Speaking as someone who does this stuff professionally, I find that most problems can be resolved but it often requires some time and effort, trial and error, to determine what's wrong. It's all very much harder when I have to work through your eyes and ears to proxy things through, and I don't actually get a chance to see your setup directly.

All im saying is the problems come when the SAS Controllers are being used. this DA10 shows no issues on onboard SATA only on the Dell Perc H310 Controller whether i use SAS forward breakout cables or a proper SAS backplane thats when the issue comes, im just fed up of RMA drives which are not the problem according to Seagate.

yet what i dont get is the drive shows its detached yet camcontrol devlist shows that the drive is passed and the drive is still in the array.

Code:
<ATA ST1000DM003-1ER1 CC45>        at scbus0 target 0 lun 0 (pass0,da0)
<ATA ST1000DM003-1ER1 CC45>        at scbus0 target 1 lun 0 (pass1,da1)
<ATA ST1000DM010-2EP1 CC43>        at scbus0 target 2 lun 0 (pass2,da2)
<ATA ST1000DM010-2EP1 CC43>        at scbus0 target 3 lun 0 (pass3,da3)
<ATA ST3000DM003-2AE1 0001>        at scbus0 target 8 lun 0 (pass4,da4)
<ATA ST3000VN007-2AH1 SC60>        at scbus0 target 9 lun 0 (pass5,da5)
<ATA ST3000VN007-2E41 SC60>        at scbus0 target 10 lun 0 (pass6,da6)
<ATA ST3000DM008-2DM1 CC26>        at scbus0 target 11 lun 0 (pass7,da7)
<ATA ST3000VN000-1HJ1 SC60>        at scbus0 target 12 lun 0 (pass8,da8)
<ATA ST3000VN007-2AH1 SC60>        at scbus0 target 13 lun 0 (pass9,da9)
<GOOXI Bobcat 0d00>                at scbus0 target 14 lun 0 (ses0,pass10)
<ATA ST1000DM010-2EP1 CC43>        at scbus0 target 15 lun 0 (da10,pass11)
<ATA ST1000DM010-2EP1 CC43>        at scbus0 target 16 lun 0 (pass12,da11)
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
This thing is annoying!

Screenshot_20210119_035337.jpg
 
Last edited:

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
I'm out of ideas. Tried different cables, motherboards, cpus, ram, sas controllers and power supplies. I've also changed the backplane to the non sas expander and as you can see same problem.

Think its time to throw it in the trash. Because nothing works. I've spent over £400 on this and still same issues!
 
Last edited:
Top