CAM status: SCSI status error - what does it means?

Status
Not open for further replies.

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
So I've managed to get past my previous problems and keep working forward towards finally getting my FreeNAS build up and running. I'm finally to drive testing.

So far I've run a short, conveyance and long test on all my drives as per https://forums.freenas.org/index.php?threads/how-to-hard-drive-burn-in-testing.21451/

Well, now I've come to the badblocks test and things were taking a little longer than they were supposed to, like some drives were stuck at under 30% on the first write pass for 30hrs. So I stop the tests and start running some basic dd commands to check the r/w speeds. For some reason my read speeds seem fine but my write speeds are in the toilet, like 10-20MB/s bad.

I've noticed that during the writes (especially during the slowest of them), I can see a bunch of weird errors popping up on FreeNAS' screen. Stuff like "CAM status: SCSI status error" and "CAM status: CCB request completed with an error".

Some images for reference:

YBGsGUB.jpg

y5NRHDe.jpg
 

tvsjr

Guru
Joined
Aug 29, 2015
Messages
959
Post the output of "smartctl -x /dev/da#", replacing the # with the number of the drives giving this error (1, 4, and 5 from your snapshot). Something tells me you bought a stack of cheap SAS drives off eBay and a significant number of them are toast.
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
Post the output of "smartctl -x /dev/da#", replacing the # with the number of the drives giving this error (1, 4, and 5 from your snapshot). Something tells me you bought a stack of cheap SAS drives off eBay and a significant number of them are toast.
That or the backplane/cable or HBA could also be faulty.
 
Joined
Jan 7, 2015
Messages
1,155
Power supply specs?
 

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
Post the output of "smartctl -x /dev/da#", replacing the # with the number of the drives giving this error (1, 4, and 5 from your snapshot). Something tells me you bought a stack of cheap SAS drives off eBay and a significant number of them are toast.

All the drives were purchased from WD's store directly from them. They are all brand new and they have passed the short, conveyance and long tests with no errors. I'll grab the output of smartctl and post it in a few minutes.

That or the backplane/cable or HBA could also be faulty.

I purchased the SFF-8484 to SFF-8087 connectors new from Monoprice; I'm not opposed to the idea that they are the cause of this problem (especially since that would be a super easy and free fix).

The backplane is the 3 SFF-8484 connector model for the Dell C2100. I'm not sure how to test this one properly. When moved to a slot on the backplane that corresponds to a breakout cable that connects directly to my Motherboard there is no errors and the speed is good. It's only in a slot that would connect to the HBA or cables that there is an issue.

The HBA is a H200 Mezzanine model that I've attempted to crossflash. I've been using this tutorial for the process: https://techmattr.wordpress.com/201...-flashing-to-it-mode-dell-perc-h200-and-h310/.
The interesting thing about this is that I have had problems with the crossflash. Megarec could not properly run cleanflash on it, it would crash at 40-50% completion. The only way I could find around it was using
Code:
s2fp19.exe -o -e 6
on my card, which would erase the flash successfully (usually). Then somehow I managed to flash the 2118it.bin firmware directly without going through the 6GBPSAS.fw.

Power supply specs?
2x Redundant 750W powersupplies.
 

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
Here are pastebins of the "smartctl -x " command that was requested.

da0 - http://pastebin.com/rApPBDnd
da1 (moved to ada0 for testing) - http://pastebin.com/mrtMWbe8
da2 - http://pastebin.com/7EvYXMGV
da3 - http://pastebin.com/iYEJRhYS
da4 - http://pastebin.com/dA1bNfbE
da5- http://pastebin.com/JJ9ui7be

On moving da1 to ada0, both are on the backplane but the difference is that one is one the HBA/new cables and the other is on the old cables and motherboard. There are quite a few errors thrown when on da1 but the same drive moved to ada0 had zero errors and wrote at a very respectable rate.

EDIT:

I feel I've eliminated the possibility of any issues with the backplane, by moving the breakout cable to each connector and testing speeds. Speeds are good on every connector using the breakout cable.

So that narrows it down to either an issue with the cables (which I'm going to try and find a place to purchase a new set to test with) or an issue with the H200 and possibly caused by a bad crossflash.
 
Last edited:
Joined
Jan 7, 2015
Messages
1,155
2x Redundant 750W powersupplies.
I only ask because I chased my tail once with these very issues. In the end a new PSU, SFF cables and power splitters fixed these issues, im not sure what one it actually was because I replaced them all at the same time. But it indeed fixed the issues. I think you are on the right track. Good luck!
 

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
I only ask because I chased my tail once with these very issues. In the end a new PSU, SFF cables and power splitters fixed these issues, im not sure what one it actually was because I replaced them all at the same time. But it indeed fixed the issues. I think you are on the right track. Good luck!

That's reasonable. I always appreciate anyone willing to help. I've had a lot of threads that don't get responded to.

So I think we don't need to worry about the PSU, thanks for the suggestion.
 
Joined
Jan 7, 2015
Messages
1,155
So I think we don't need to worry about the PSU, thanks for the suggestion.
Agree.
 

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
I've tested with another brand new 8484 to 8087 connector and I'm still getting SCSI errors on write. Also I noticed that occasionally while running the write test, all the drive lights attached to the HBA turn red, don't know what that's supposed to indicate though because it's not the drives that are bad.

My last test before declaring the HBA bad is going to be grabbing a cable to directly attach the drives to the HBA without the backplane. I know that I basically eliminated the backplane as an issue but I want to make 100% sure before nagging the ebay seller I got it from. Here's hoping they'll be kind enough to help me fix or replace the HBA.

I don't think it's glitching due to a bad flash or anything, I've figured out how to consistently reflash the 9211 IT firmware. I'd also think there would be obvious errors during the flash that would come up. So if this doesn't work, I'm going to hope the Ebay seller will take care of this.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
SFF-8484? What kind of messed-up SAS2 HBA uses those?
 

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
SFF-8484? What kind of messed-up SAS2 HBA uses those?

My backplane uses 3 SFF-8484 connectors. 1 is connected via a reverse breakout cable to the motherboard SATA ports. The other 2 are connected via SFF-8484 to SFF-8087 cables to my H200 that I've crossflashed to 9211 IT firmware.


Also, I'm now completely baffled.

I have 0 errors when connecting the drives through the backplane directly to the MB

I have 0 errors using my new breakout cable to directly attach 2 drives to the H200.

However, I get write errors when connecting through the backplane to the H200.


I have tried all 3 of my SFF-8484 ports with the breakout cable attached to my motherboard and had no errors, so I wouldn't think it's the backplane
I have tried 3 different SFF-8484 to SFF-8087 cables showed errors with each one, so I wouldn't think it's the cables.
I have tried directly attaching 2 drives to the H200 and showed no errors, so I wouldn't think it's the H200.

What the heck is going on?!? What could be causing this weird set of errors?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
However, I get write errors when connecting through the backplane to the H200.
How long is the SFF-8087 to SFF-8484 cable? And the breakout cables?

SATA has a very strict 1m maximum cable length. Even that is pushing it. 1m+backplane would also go over the spec.
 

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
How long is the SFF-8087 to SFF-8484 cable? And the breakout cables?

SATA has a very strict 1m maximum cable length. Even that is pushing it. 1m+backplane would also go over the spec.

2 of the cables were 1m long, and 1 is .5m long.

I'm currently, retesting with the 8484 to 4xSATA, to make sure that the errors aren't only on specific slots on the backplane and causing this issue to get confounded.
 
Last edited:

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
The 8484 to 4xSATA, still didn't show errors. After talking with some people in the IRC, I've begun working under the assumption that the H200 is the issue. I've ordered a new 9211-8i off Amazon (surprisingly only $100) and will retest with that. Hopefully this will resolve the issue, because it being the backplane will be a horrible issue to resolve.
 

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
I'm begging you guys, if you can think of any more suggestions for what to try or how to fix this that'd be fantastic.

I received and replaced the backplane and tried with all 3 cables, on both HBAs and I'm still getting the same SCSI errors.

I can only think of 2 possibilities (please let me know if you have another). Either the replacement backplane still has issues or the cables break in the same way whenever I bend them the slight amount they need to in order to attach to my backplane, since they aren't 90 degree cables.
 
Joined
Jan 7, 2015
Messages
1,155
Lol, now you know where I was once on this. I just went down the line and started replacing shit. I didnt have the extra variables of a backplane though, so your mileage will vary.

When the error shows "UNIT ATTENTION Device power, reset, or bus device reset", to me, that clearly stated either the drive or the port is losing connection for a period of time some how some way, whether it data or power, cable or port, HBA or drive. Its also usually prescribed to get new cables when this pops up.

Try the HBA in different MB ports, a wonky pin in the port maybe, hunk of dirt, worn out a bit? Ive never had it happen, doesnt mean it cant. Explains why both would do it, and also why it doesnt do it when connecting right to the motherboard. Loose connection on a power cable or splitter somewhere? Those pins on molex splitters can get pushed out a little by the female end and make a very intermittent connection at best, sometimes without noticing. Just spitballin here.

I received and replaced the backplane and tried with all 3 cables, on both HBAs and I'm still getting the same SCSI errors.

I can only think of 2 possibilities (please let me know if you have another). Either the replacement backplane still has issues or the cables break in the same way whenever I bend them the slight amount they need to in order to attach to my backplane, since they aren't 90 degree cables.

I say the breaking cables is not that possible. And one last thing. I used to get these errors almost immediately after my HBA lit up during boot. Are you experiencing this at all or, is it after the system is up, ZFS imports complete?
 

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
Lol, now you know where I was once on this. I just went down the line and started replacing crap. I didnt have the extra variables of a backplane though, so your mileage will vary.

When the error shows "UNIT ATTENTION Device power, reset, or bus device reset", to me, that clearly stated either the drive or the port is losing connection for a period of time some how some way, whether it data or power, cable or port, HBA or drive. Its also usually prescribed to get new cables when this pops up.

Try the HBA in different MB ports, a wonky pin in the port maybe, hunk of dirt, worn out a bit? Ive never had it happen, doesnt mean it cant. Explains why both would do it, and also why it doesnt do it when connecting right to the motherboard. Loose connection on a power cable or splitter somewhere? Those pins on molex splitters can get pushed out a little by the female end and make a very intermittent connection at best, sometimes without noticing. Just spitballin here.



I say the breaking cables is not that possible. And one last thing. I used to get these errors almost immediately after my HBA lit up during boot. Are you experiencing this at all or, is it after the system is up, ZFS imports complete?


The 2 HBA/RAID cards that I've tried have been on 2 separate connectors. One on a proprietary motherboard connection and the other in a PCIe slot.

For power, that doesn't seem to make sense since they work fine when connected through the backplane to the motherboard but not through the backplane to the HBA. Also I don't believe any molex splitters are involved in this but I could be wrong.

I do agree about that error message though I am unsure where the exact problem is.

Here's a screenshot of the cables, I am starting to lean towards the gap being too small and all the cables are being bent too far but that doesn't feel good since reading seems to work error free. http://imgur.com/a/stItm
 
Joined
Jan 7, 2015
Messages
1,155
Id say it would be very unlikely for that little of a bend in stranded cables to break one cable, forget multiple sets. Now different story if its copper clad aluminum or something sub-par single conductor, like crappy cat5. I could then see that being a possibility. A connector crimped on a little too much or something, causing a failure. I chased the bad (single) sata cable angle for days bro and every time I thought it had worked, it didnt. Again, I did not have the backplane to contend with. That adds a layer of complexity to the situation. So my next question is how much time is passing before you consider it "working" or "not working", and how are you determining this? Badblocks or dd reads? Could it be that it is always doing it, just ironically more (not working) or less (working) frequently? What are the constants, what are variables. It seems you have tried different variables, now try changing some constants (if you can), boards, powers, rams, etc. Maybe try booting into a Linux environment and seeing if the errors persist. Not sure what else to add here. Some unseen compatability issues maybe? Its a mystery when these show up sometimes.

Im pulling for you. You will figure it out eventually, and when you do please let us all know.
 
Status
Not open for further replies.
Top