Weird SCSI errors in the console

Octopuss

Patron
Joined
Jan 4, 2019
Messages
461
I am running badblocks on four new disks through SSH on my NAS, and the console is constantly spamming this:
1642073472048.png


The server: TruNAS 12 (latest) running in ESXi, with LSI 2308-based HBA card with passthrough properly configured.
Connected: three SATA disks as the current pool and four new disks that are being tested.
The new disks are SAS HGST HUS726040AL4210.

Badblocks seems to be happily running on all four new disks in parallel.
1642073686188.png


Does anyone know what these errors are and whether they are harmless or something is wrong?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
From the dmesg output, it seems that da4 is not in good shape.

Stop the badblocks and run a long smart test on it.
 

Octopuss

Patron
Joined
Jan 4, 2019
Messages
461
I did that before starting badblocks.
This is the output when badblocks is running just one da4. When I run it on all disks, the console output is the same, only the error shows up on all of them. I highly doubt I bought FIVE faulty disks, this must be something else.
Actually I believe I saw similar errors like this in past with the original SATA disks too.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
It could be cabling, the controller or both... check those.

Consider controller firmware too.
 

Octopuss

Patron
Joined
Jan 4, 2019
Messages
461
I've been using the same HBA for a few years now, I highly doubt that's a problem. Cable is new, firmware doesn't matter either because I flashed that when I installed the card.

Thus far noone seems to be familiar with this error it seems.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703

Octopuss

Patron
Joined
Jan 4, 2019
Messages
461
Of course you can never be sure, but everything seems to be working just fine. Badblocks has been running for 4,5 hours without problems/errors, disks are obviously operational, SMART doesn't report any errors either... it's weird. I'm inclined to say it's a harmless error, possibly related to the virtualized environment, somehow. It's annoying not being able to clearly identify what's going on though.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I'm inclined to say it's a harmless error, possibly related to the virtualized environment, somehow. It's annoying not being able to clearly identify what's going on though.
Maybe...

I have a TrueNAS VM in ESXi with a 9300-16i (which is a single card with 2x 3008 chips on it) and I see no such errors.

Your dmesg is certainly not free from errors, so I would feel unsure with my data on that server if I were seeing that.
 

Octopuss

Patron
Joined
Jan 4, 2019
Messages
461
I guess I will experiment a little further once badblocks finishes (like in five days, bleh). Currently I have mixture of SATA and SAS disks connected to the card, so I'll try with either alone for starters. Everything should be compatible though, as far as I know.
 

Octopuss

Patron
Joined
Jan 4, 2019
Messages
461
I was trying to remember what card exactly I had, and I googled up this sas2flash program.
But this is weird....
1642095634838.png


Clearly there IS an adapter in the system, otherwise no disks would even show up.
 

Alecmascot

Guru
Joined
Mar 18, 2014
Messages
1,177
sas2flash -listall
 

Octopuss

Patron
Joined
Jan 4, 2019
Messages
461
That's the output of that command, duh.
 

Alecmascot

Guru
Joined
Mar 18, 2014
Messages
1,177
Code:
    root@freenas:~ # sas2flash -listall

    LSI Corporation SAS2 Flash Utility

    Version 16.00.00.00 (2013.03.01)

    Copyright (c) 2008-2013 LSI Corporation. All rights reserved


            Adapter Selected is a LSI SAS: SAS2008(B2)


    Num   Ctlr            FW Ver        NVDATA        x86-BIOS         PCI Addr

    ----------------------------------------------------------------------------


    0  SAS2008(B2)     20.00.07.00    14.01.00.08    07.11.10.00     00:02:00:00


            Finished Processing Commands Successfully.

            Exiting SAS2Flash.
 

Octopuss

Patron
Joined
Jan 4, 2019
Messages
461
Oh hold on, it has to be run with sudo. Damnit this Linux stuff is annoying.
Yea it works now.
...but it still only says SAS2308_2(D1). That's not very helpful. But it's not important anyway. It's running the latest FW.
 

revengineer

Contributor
Joined
Oct 27, 2019
Messages
193
I used to get errors like this when my HBA was overheating. I put a 40mm fan on the heat sink covering the controller chip and the errors have not surfaced since.

EDIT: I should also note that this issue did not start on day one. The server ran fine without such fan for a year. Not sure what changed over time, perhaps the thermal epoxy changed properties, or the air flow changed somehow.
 
Last edited:

Octopuss

Patron
Joined
Jan 4, 2019
Messages
461
It's not overheating. I've been running it without fan for two years without problems, BUT, since I added four more disks and I am running tests, I temporarily added a fan blowing on the HBA.
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
Since you're burning in those disks, the quick option is to swap a cable. If the errors follow the drive, it's likely a bad drive. If it sticks with the cable position...
 

irp21

Cadet
Joined
Jan 13, 2022
Messages
1
is that mobaxterm i see you using? nice! I've enjoyed using it on my work laptop thats windows. Haven't found anything on my personal linux machine that has all those capabilities in one.
 

Octopuss

Patron
Joined
Jan 4, 2019
Messages
461
Since you're burning in those disks, the quick option is to swap a cable. If the errors follow the drive, it's likely a bad drive. If it sticks with the cable position...
I plan to order one because of the molex reductions, but interestingly, cables like these are hard to obtain in my country. No idea why.
The current cable also looks fragile as hell.
I mean what the flying fak is this??? I'm afraid to even breathe on it.

1642147086808.png


is that mobaxterm i see you using? nice! I've enjoyed using it on my work laptop thats windows. Haven't found anything on my personal linux machine that has all those capabilities in one.
Yes! I have no idea how I discovered it, but that day was a happy one.
 

Octopuss

Patron
Joined
Jan 4, 2019
Messages
461
I changed the cable and the errors are gone. That's super weird.
 
Top