SAS-expander weird behavior

Status
Not open for further replies.
Joined
Apr 8, 2017
Messages
2
Hello all.
I recently moved from using a normal internal sas card to using a sas expander and external chassis.
Here is the weird thing; If i turn on the external system (including the expander) before starting freenas then it will see
at most one or two drives.
To get it working I have to turn of the power to the storage system and turn it back on again.
Then it takes a while and nothing happens,
I now have to open a ssh-connection and issue sas2ircu 0 display which usually times out at the first try.
Doing it again and everything comes to life, all the disks are listed and after a few seconds they also show up in freenas gui.

Using Freenas 9.10.2-U2.
SAS2008 fw 20.00.04.00
HP 36 port SAS expander fw 2.08 (sierra)
Card and expander is running at 6gb, the fastest hgst disk are also running 6gb, the rest are either running at 3gb or 1,5 gb (I have no idea there either, assuming disk manufacturer sets the bus speed depending on how fast the data can get from the drive).

I have absolutely no idea except possibly that the nas is running on a hyperconverged main server and the sas-card is passed through. Never had a single problem with this config when the disks where attached directly to the sas card.
Attaching the security log as it shows exactly what happens.
First the startup at the top, then the detach after i turn off the power and finally when all the disks are recognized.
Not a single hickup (I/O-error or such) after the disks are detected even during hard work.

Thanks.

Code:
freenas.local kernel log messages:
> CPU: AMD Opteron 240 (Gen 1 Class Opteron) (3500.46-MHz K8-class CPU)
> random: unblocking device.
> mps0: SAS Address from SATA device = 17443d6b92718268
> mps0: SAS Address from SATA device = 17533b6692718267
> failure at /freenas-9.10-releng/_BE/os/sys/dev/mps/mps_sas_lsi.c:675/mpssas_add_device()! Could not get ID for device with handle 0x000b
> mpssas_fw_work: failed to add device with handle 0xb
> mps0: SAS Address from SATA device = 49806163dab4ba74
> failure at /freenas-9.10-releng/_BE/os/sys/dev/mps/mps_sas_lsi.c:675/mpssas_add_device()! Could not get ID for device with handle 0x000c
> mpssas_fw_work: failed to add device with handle 0xc
> mps0: SAS Address from SATA device = 4965694bf9a3b997
> failure at /freenas-9.10-releng/_BE/os/sys/dev/mps/mps_sas_lsi.c:675/mpssas_add_device()! Could not get ID for device with handle 0x000d
> mpssas_fw_work: failed to add device with handle 0xd
> mps0: SAS Address from SATA device = 4e43455fc98e9c6d
> failure at /freenas-9.10-releng/_BE/os/sys/dev/mps/mps_sas_lsi.c:675/mpssas_add_device()! Could not get ID for device with handle 0x000e
> mpssas_fw_work: failed to add device with handle 0xe
> mps0: SAS Address from SATA device = 462d475fc98e9c6d
> failure at /freenas-9.10-releng/_BE/os/sys/dev/mps/mps_sas_lsi.c:675/mpssas_add_device()! Could not get ID for device with handle 0x000f
> mpssas_fw_work: failed to add device with handle 0xf
> mps0: SAS Address from SATA device = 4875495ac9cfd085
> failure at /freenas-9.10-releng/_BE/os/sys/dev/mps/mps_sas_lsi.c:675/mpssas_add_device()! Could not get ID for device with handle 0x0010
> mpssas_fw_work: failed to add device with handle 0x10
> mps0: SAS Address from SATA device = 4875495ac9b7ad94
> failure at /freenas-9.10-releng/_BE/os/sys/dev/mps/mps_sas_lsi.c:675/mpssas_add_device()! Could not get ID for device with handle 0x0011
> mpssas_fw_work: failed to add device with handle 0x11
> da0 at mps0 bus 0 scbus2 target 14 lun 0
> da0: <ATA HGST HDN726050AL T517> Fixed Direct Access SPC-4 SCSI device
> da0: Serial Number NAG3BK7P
> da0: 4769307MB (9767541168 512 byte sectors)
> ses0: da0,pass2: SAS Device Slot Element: 1 Phys at Slot 0
> ses0:  phy 0: parent 5001438022126226 addr 5001438022126200
> SMP: AP CPU #2 Launched!
> vboxdrv: fAsync=0 offMin=0xa5a offMax=0x50d2d
> GEOM_ELI: Device gptid/9453606d-be60-11e6-baf1-000c29af1675.eli created.
> mps0: mpssas_prepare_remove: Sending reset for target ID 14
> da0 at mps0 bus 0 scbus2 target 14 lun 0
> mps0: da0: Unfreezing devq for target ID 14
> <ATA HGST HDN726050AL T517>mps0:  s/n NAG3BK7P			Unfreezing devq for target ID 18
> GEOM_ELI: Device gptid/9453606d-be60-11e6-baf1-000c29af1675.eli destroyed.
> (da0:mps0:0:14:0): Periph destroyed
> mps0: SAS Address for SATA device = 4875495ac9cfd085
> mps0: SAS Address for SATA device = 4875495ac9b7ad94
> mps0: SAS Address from SATA device = 4875495ac9cfd085
> mps0: SAS Address from SATA device = 4875495ac9b7ad94
> da1 at mps0 bus 0 scbus2 target 14 lun 0
> da1: <ATA HGST HDN726050AL T517> Fixed Direct Access SPC-4 SCSI device
> da1: Serial Number NAG3BK7P
> da1: 4769307MB (9767541168 512 byte sectors)
> da2 at mps0 bus 0 scbus2 target 12 lun 0
> da2: <ATA WDC WD80EFZX-68U 0A83> Fixed Direct Access SPC-4 SCSI device
> da2: Serial Number VK0DB2VY
> da2: 7630885MB (15628053168 512 byte sectors)
> da3 at mps0 bus 0 scbus2 target 15 lun 0
> da3: <ATA HGST HDN726050AL T517> Fixed Direct Access SPC-4 SCSI device
> da3: Serial Number NAG2BZ5K
> da3: 4769307MB (9767541168 512 byte sectors)
> da4 at mps0 bus 0 scbus2 target 9 lun 0
> da4: Serial Number	  WD-WCC4ENPN3V37
> da5 at mps0 bus 0 scbus2 target 10 lun 0
> da5: <ATA WDC WD40EFRX-68W 0A80> Fixed Direct Access SPC-4 SCSI device
> da5: Serial Number	  WD-WCC4E3X6RE2Z
> da5: 300.000MB/s transfers
> da5: Command Queueing enabled
> da5: 3815447MB (7814037168 512 byte sectors)
> da5: quirks=0x8<4K>
> da6 at mps0 bus 0 scbus2 target 24 lun 0
> da6: <ATA WDC WD60EFRX-68L 0A82> Fixed Direct Access SPC-4 SCSI device
> da6: Serial Number	  WD-WX41DC6PF25Y
> da6: 150.000MB/s transfers
> da6: Command Queueing enabled
> da6: 5723166MB (11721045168 512 byte sectors)
> da6: quirks=0x8<4K>
> da7 at mps0 bus 0 scbus2 target 23 lun 0
> da7: <ATA WDC WD60EFRX-68L 0A82> Fixed Direct Access SPC-4 SCSI device
> da7: Serial Number	  WD-WX41DC6PFJXJ
> da7: 150.000MB/s transfers
> da7: Command Queueing enabled
> da7: 5723166MB (11721045168 512 byte sectors)
> da7: quirks=0x8<4K>
> ses0: da1,pass4: SAS Device Slot Element: 1 Phys at Slot 0
> ses0: da3,pass6: SAS Device Slot Element: 1 Phys at Slot 0
> ses0: da4,pass7: SAS Device Slot Element: 1 Phys at Slot 0
> ses0: da5,pass8: SAS Device Slot Element: 1 Phys at Slot 0
> ses0: da2,pass5: SAS Device Slot Element: 1 Phys at Slot 0
> ses0: da7,pass10: SAS Device Slot Element: 1 Phys at Slot 0
> ses0:  phy 0: SATA device
> ses0:  phy 0: parent 5001438022126226 addr 5001438022126206
> ses0: da6,pass9: SAS Device Slot Element: 1 Phys at Slot 0
> ses0:  phy 0: SATA device
> ses0:  phy 0: parent 5001438022126226 addr 5001438022126207
> GEOM_ELI: Device gptid/f874bba6-1bb6-11e7-80f8-3763eadf4da7.eli created.
> GEOM_ELI: Encryption: AES-XTS 128
> GEOM_ELI:	 Crypto: hardware
> GEOM_ELI: Device gptid/f94fcb68-1bb6-11e7-80f8-3763eadf4da7.eli created.
> nfsd: can't register svc name
> GEOM_ELI: Device da5p1.eli created.
> GEOM_ELI: Encryption: AES-XTS 128
> GEOM_ELI:	 Crypto: hardware
> GEOM_ELI: Device da6p1.eli created.
> GEOM_ELI: Encryption: AES-XTS 128
> GEOM_ELI:	 Crypto: hardware
> GEOM_ELI: Device da7p1.eli created.
> GEOM_ELI: Encryption: AES-XTS 128
> GEOM_ELI:	 Crypto: hardware
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
While I think you agree that this sounds like a hardware issue, not FreeNAS, if you provide the exact part name/model numbers of you system, someone migth be able to help you but to be honest, I don't know. If this had ever worked in the past then I'd say change out a cable. Honestly, your best bet is to find a forum for your hardware and see if they can help out.

Best of luck to you!
 
Joined
Apr 8, 2017
Messages
2
Hello.
I did not really know if I would call this a hardware issue, Fiddling with cables and switching around? In my experience those kinds of
problems usually manifest themselves as intermittent io errors, not just startup problems. It could of course be the external chassis but the problem is so very consistent.

Then a though struck me, i never blacklistedthe sas-card in the host system. Meaning that the base linux system detects disks and actually makes them available for use before the vm starts up and the card is reset. I remembered this earlier today because I also remember explicitly changing the smart-config on the host from all-disks to specified disks since I got a bunch of screaming emails every time freenas started up and the disks disappeared.

Anyway, I just took a look at the host bootlog and sure enough, it finds the expander and all the disks.
When freenas starts the sas card is successfully reset and the disks unregistered.

Looking at the logs from the host, the card is first removed and pci-reset issued, this is then picked up by the sas driver that unregisters all the disks and claims success before removing itself.
Problem is that the driver can't actually do this to the real hardware and is simply removing things that arent there anymore.

My theory is that while the sas card itself is perfectly fine with this, the expander is left in some sort of state between being properly reset and just continuing from before.

The system will be shutdown for maintenance during the long weekend so that theory should be simple enough to test.

This machine is a proper frankenmonster of a build, proxmox (got REALLY fed up with vmware), multiple graphics cards going to different vms, freenas running as a poor-mans san with direct virtualized network cards.
The expansion-chassis is homebuilt with the expander getting fed its power trough an external pcie-usb (with molex power) module meant for gpu-mining nuts.

This little annoying startup-problem aside it seems to work perfectly, all the disks are mounted vertically and cooled by three 120mm fans running at 5v which makes the only audible the noise from the external config the disk-noises themselves.

I built this after getting fed up with the normal external chassis where the disks basically heat each other and powerful fans keep the temperature down but the noise level way up, This config looks more like the storagepods that backblaze use except each chassis-module contains 12 disks and is made from standard euro-racks they are also completely open to the room on top and where supposed to be the same on the bottom as i was counting on the chimney-effect to cool the disks. Now there are three fans there which by some strange coincidence almost perfectly matched the bottom, even the screw holes matched up(sortof),

Still half the noise compared to when the disks where in normal chassis.

And the power usage, i've crammed what used to be three different computers into a single one, normally consuming less than 300w.
That also includes the gaming computer (normally in standby, woken by steam-link), high-performance routers and switches(i got gigabit internet and the vpn normally runs at a couple hundred mbps) and a bunch of other related things.

My power bill is waaay down...
 
Status
Not open for further replies.
Top