Well -- I spent a few hours doing careful troubleshooting tonight, and I've made significant progress in understanding this. It really seems to be a timing issue -- it's like FreeNAS/FreeBSD can't handle too many SCSI timeouts concurrently.
This might be long, but I'll try to summarize as best I can.
First, I moved off the x3650 M3 and put the 9200-8e card into an x4 slot in my TS440 server. Initially, I tested using FreeNAS bare-metal. It booted fine with the card installed, but no connection to the exp3000. Then, when I attached the enclosure, I'd get the same behavior. The good thing is that I was able to capture the fatal trap in this environment, since the /var/log directory is actually logging to a ZFS pool (whereas on the x3650 I had no other pool defined). I decided to switch to ESXi, though, and I haven't rebooted back into bare-metal, so I can't post that file just yet.
Now. With ESXi booting on the TS440 and with a the 9200-8e in passthrough to FreeNAS 9.10.2, I can get it to boot fine with the exp3k powered down. If I turn on the exp3k, it takes three probes for FreeNAS to crash. But I was able to switch off the exp3k quickly and NOT cause the full trap/reboot scenario. Here's what I could capture in /var/log/messages with these "brief" powered connections:
Code:
Mar 1 21:57:00 freenas da1 at mps0 bus 0 scbus3 target 12 lun 0
Mar 1 21:57:00 freenas da1: <IBM-ESXS ST31000424SS BC2D> Fixed Direct Access SPC-3 SCSI device
Mar 1 21:57:00 freenas da1: Serial Number 9WK2GG6P0000C1176QT1
Mar 1 21:57:00 freenas da1: 300.000MB/s transfers
Mar 1 21:57:00 freenas da1: Command Queueing enabled
Mar 1 21:57:00 freenas da1: 953869MB (1953525168 512 byte sectors)
Mar 1 21:57:00 freenas da2 at mps0 bus 0 scbus3 target 11 lun 0
Mar 1 21:57:00 freenas da2: <IBM-ESXS ST31000424SS BC2D> Fixed Direct Access SPC-3 SCSI device
Mar 1 21:57:00 freenas da2: Serial Number 9WK2KXFS0000C118873A
Mar 1 21:57:00 freenas da2: 300.000MB/s transfers
Mar 1 21:57:00 freenas da2: Command Queueing enabled
Mar 1 21:57:00 freenas da2: 953869MB (1953525168 512 byte sectors)
Mar 1 21:57:00 freenas da3 at mps0 bus 0 scbus3 target 18 lun 0
Mar 1 21:57:00 freenas da3: <IBM-ESXS ST31000424SS BC2D> Fixed Direct Access SPC-3 SCSI device
Mar 1 21:57:00 freenas da3: Serial Number 9WK2KYZM0000C11888DJ
Mar 1 21:57:00 freenas da3: 300.000MB/s transfers
Mar 1 21:57:00 freenas da3: Command Queueing enabled
Mar 1 21:57:00 freenas da3: 953869MB (1953525168 512 byte sectors)
Mar 1 21:57:00 freenas da4 at mps0 bus 0 scbus3 target 9 lun 0
Mar 1 21:57:00 freenas da4: <IBM-ESXS ST31000424SS BC2D> Fixed Direct Access SPC-3 SCSI device
Mar 1 21:57:00 freenas da4: Serial Number 9WK2KZ780000C1185HL6
Mar 1 21:57:00 freenas da4: 300.000MB/s transfers
Mar 1 21:57:00 freenas da4: Command Queueing enabled
Mar 1 21:57:00 freenas da4: 953869MB (1953525168 512 byte sectors)
Mar 1 21:57:00 freenas da5 at mps0 bus 0 scbus3 target 14 lun 0
Mar 1 21:57:00 freenas da5: <IBM-ESXS ST31000424SS BC2D> Fixed Direct Access SPC-3 SCSI device
Mar 1 21:57:00 freenas da5: Serial Number 9WK2KZ870000C11888F4
Mar 1 21:57:00 freenas da5: 300.000MB/s transfers
Mar 1 21:57:00 freenas da5: Command Queueing enabled
Mar 1 21:57:00 freenas da5: 953869MB (1953525168 512 byte sectors)
Mar 1 21:57:00 freenas da6 at mps0 bus 0 scbus3 target 8 lun 0
Mar 1 21:57:00 freenas da6: <IBM-ESXS ST31000424SS BC2D> Fixed Direct Access SPC-3 SCSI device
Mar 1 21:57:00 freenas da6: Serial Number 9WK2K45A0000C1185FPN
Mar 1 21:57:00 freenas da6: 300.000MB/s transfers
Mar 1 21:57:00 freenas da6: Command Queueing enabled
Mar 1 21:57:00 freenas da6: 953869MB (1953525168 512 byte sectors)
Mar 1 21:57:00 freenas da7 at mps0 bus 0 scbus3 target 16 lun 0
Mar 1 21:57:00 freenas da7: <IBM-ESXS ST31000424SS BC2D> Fixed Direct Access SPC-3 SCSI device
Mar 1 21:57:00 freenas da7: Serial Number 9WK2K4MS0000C1185HCR
Mar 1 21:57:00 freenas da7: 300.000MB/s transfers
Mar 1 21:57:00 freenas da7: Command Queueing enabled
Mar 1 21:57:00 freenas da7: 953869MB (1953525168 512 byte sectors)
Mar 1 21:57:00 freenas da8 at mps0 bus 0 scbus3 target 10 lun 0
Mar 1 21:57:00 freenas da8: <IBM-ESXS ST31000424SS BC2D> Fixed Direct Access SPC-3 SCSI device
Mar 1 21:57:00 freenas da8: Serial Number 9WK2K4BY0000C1185GP1
Mar 1 21:57:00 freenas da8: 300.000MB/s transfers
Mar 1 21:57:00 freenas da8: Command Queueing enabled
Mar 1 21:57:00 freenas da8: 953869MB (1953525168 512 byte sectors)
Mar 1 21:57:00 freenas da9 at mps0 bus 0 scbus3 target 17 lun 0
Mar 1 21:57:00 freenas da9: <IBM-ESXS ST31000424SS BC2D> Fixed Direct Access SPC-3 SCSI device
Mar 1 21:57:00 freenas da9: Serial Number 9WK2K4W80000C118B1RH
Mar 1 21:57:00 freenas da9: 300.000MB/s transfers
Mar 1 21:57:00 freenas da9: Command Queueing enabled
Mar 1 21:57:00 freenas da9: 953869MB (1953525168 512 byte sectors)
Mar 1 21:57:00 freenas da10 at mps0 bus 0 scbus3 target 15 lun 0
Mar 1 21:57:00 freenas da10: <IBM-ESXS ST31000424SS BC2D> Fixed Direct Access SPC-3 SCSI device
Mar 1 21:57:00 freenas da10: Serial Number 9WK55STP0000920363LR
Mar 1 21:57:00 freenas da10: 300.000MB/s transfers
Mar 1 21:57:00 freenas da10: Command Queueing enabled
Mar 1 21:57:00 freenas da10: 953869MB (1953525168 512 byte sectors)
Mar 1 21:57:00 freenas da11 at mps0 bus 0 scbus3 target 19 lun 0
Mar 1 21:57:00 freenas da11: <IBM-ESXS ST31000424SS BC2D> Fixed Direct Access SPC-3 SCSI device
Mar 1 21:57:00 freenas da11: Serial Number 9WK55SLF0000C203EEDB
Mar 1 21:57:00 freenas da11: 300.000MB/s transfers
Mar 1 21:57:00 freenas da11: Command Queueing enabled
Mar 1 21:57:00 freenas da11: 953869MB (1953525168 512 byte sectors)
Mar 1 21:57:00 freenas da12 at mps0 bus 0 scbus3 target 13 lun 0
Mar 1 21:57:00 freenas da12: <IBM-ESXS ST31000424SS BC2D> Fixed Direct Access SPC-3 SCSI device
Mar 1 21:57:00 freenas da12: Serial Number 9WK2KXFK0000C1188660
Mar 1 21:57:00 freenas da12: 300.000MB/s transfers
Mar 1 21:57:00 freenas da12: Command Queueing enabled
Mar 1 21:57:00 freenas da12: 953869MB (1953525168 512 byte sectors)
Mar 1 21:57:01 freenas (probe12:mps0:0:20:0): INQUIRY. CDB: 12 01 83 00 fc 00
Mar 1 21:57:01 freenas (probe12:mps0:0:20:0): CAM status: SCSI Status Error
Mar 1 21:57:01 freenas (probe12:mps0:0:20:0): SCSI status: Busy
Mar 1 21:57:01 freenas (probe12:mps0:0:20:0): Retrying command
Then I switched it off, and we see some detach messages. note that it does actually recognize all 12 disks.
Next, I decided to try pulling all the drives out of the enclosure. This was really helpful.
If I tried to boot up with the enclosure powered on, attached, but empty, I would STILL get the 3 probes followed by the trap. But if I disconnected it and allowed the system to boot first, THEN attach the cable at the HBA, then I was able to get the enclsoure recognized:
Code:
Mar 1 22:48:58 freenas ses0 at mps0 bus 0 scbus3 target 20 lun 0
Mar 1 22:48:58 freenas ses0: <IBM-ESXS EXP3000 01C1> Fixed Enclosure Services SPC-2 SCSI device
Mar 1 22:48:58 freenas ses0: 300.000MB/s transfers
Mar 1 22:48:58 freenas ses0: Command Queueing enabled
Mar 1 22:48:58 freenas ses0: SCSI-3 ENC Device
Mar 1 22:48:58 freenas (probe0:mps0:0:20:1): INQUIRY. CDB: 12 00 00 00 24 00
Mar 1 22:48:58 freenas (probe0:mps0:0:20:1): CAM status: SCSI Status Error
Mar 1 22:48:58 freenas (probe0:mps0:0:20:1): SCSI status: Busy
Mar 1 22:48:58 freenas (probe0:mps0:0:20:1): Retrying command
Mar 1 22:48:59 freenas (probe0:mps0:0:20:2): INQUIRY. CDB: 12 00 00 00 24 00
Mar 1 22:48:59 freenas (probe0:mps0:0:20:2): CAM status: SCSI Status Error
Mar 1 22:48:59 freenas (probe0:mps0:0:20:2): SCSI status: Busy
Mar 1 22:48:59 freenas (probe0:mps0:0:20:2): Retrying command
Mar 1 22:49:01 freenas (probe0:mps0:0:20:2): INQUIRY. CDB: 12 00 00 00 24 00
Mar 1 22:49:01 freenas (probe0:mps0:0:20:2): CAM status: SCSI Status Error
Mar 1 22:49:01 freenas (probe0:mps0:0:20:2): SCSI status: Busy
Mar 1 22:49:01 freenas (probe0:mps0:0:20:2): Retrying command
note that this time, we see the ENCLOSURE itself recognized, whereas earlier it was just the drives.
awesome -- we now have an enclosure talking! let's try inserting the drives, one by one.
we start with disk 1
Code:
Mar 1 22:54:53 freenas da1 at mps0 bus 0 scbus3 target 12 lun 0
Mar 1 22:54:53 freenas da1: <IBM-ESXS ST31000424SS BC2D> Fixed Direct Access SPC-3 SCSI device
Mar 1 22:54:53 freenas da1: Serial Number 9WK2GG6P0000C1176QT1
Mar 1 22:54:53 freenas da1: 300.000MB/s transfers
Mar 1 22:54:53 freenas da1: Command Queueing enabled
Mar 1 22:54:53 freenas da1: 953869MB (1953525168 512 byte sectors)
Mar 1 22:54:54 freenas ses0: da1,pass3: Element descriptor: 'SLOT 01 '
Mar 1 22:54:54 freenas ses0: da1,pass3: SAS Device Slot Element: 1 Phys at Slot 1, Not All Phys
Mar 1 22:54:54 freenas ses0: phy 0: SAS device type 1 id 0
Mar 1 22:54:54 freenas ses0: phy 0: protocols: Initiator( None ) Target( SSP )
Mar 1 22:54:54 freenas ses0: phy 0: parent 500a0b8755a83000 addr 5000c50025b96325
excellent! now, let's try attaching the second SFF8088 cable (allowing multiparth to work)...
Code:
Mar 1 22:56:54 freenas da2 at mps0 bus 0 scbus3 target 25 lun 0
Mar 1 22:56:54 freenas da2: <IBM-ESXS ST31000424SS BC2D> Fixed Direct Access SPC-3 SCSI device
Mar 1 22:56:54 freenas da2: Serial Number 9WK2GG6P0000C1176QT1
Mar 1 22:56:54 freenas da2: 300.000MB/s transfers
Mar 1 22:56:54 freenas da2: Command Queueing enabled
Mar 1 22:56:54 freenas da2: 953869MB (1953525168 512 byte sectors)
Mar 1 22:56:57 freenas ses1 at mps0 bus 0 scbus3 target 33 lun 0
Mar 1 22:56:57 freenas ses1: <IBM-ESXS EXP3000 01C1> Fixed Enclosure Services SPC-2 SCSI device
Mar 1 22:56:57 freenas ses1: 300.000MB/s transfers
Mar 1 22:56:57 freenas ses1: Command Queueing enabled
Mar 1 22:56:57 freenas ses1: SCSI-3 ENC Device
Mar 1 22:56:57 freenas (probe0:mps0:0:33:1): INQUIRY. CDB: 12 00 00 00 24 00
Mar 1 22:56:57 freenas (probe0:mps0:0:33:1): CAM status: SCSI Status Error
Mar 1 22:56:57 freenas (probe0:mps0:0:33:1): SCSI status: Busy
Mar 1 22:56:57 freenas (probe0:mps0:0:33:1): Retrying command
Mar 1 22:56:57 freenas ses1: da2,pass4: Element descriptor: 'SLOT 01 '
Mar 1 22:56:57 freenas ses1: da2,pass4: SAS Device Slot Element: 1 Phys at Slot 1, Not All Phys
Mar 1 22:56:57 freenas ses1: phy 0: SAS device type 1 id 1
Mar 1 22:56:57 freenas ses1: phy 0: protocols: Initiator( None ) Target( SSP )
Mar 1 22:56:57 freenas ses1: phy 0: parent 500a0b8755be0000 addr 5000c50025b96326
ok.. this is good -- we see the second enclsoure instance and the second instance of disk 1 via the secondary path.
I kept inserting drives one-by-one until all 12 were inserted and recognized.
There was a slight problem here with multiparth -- it was recognized for some drives but not all. I ended up deleting the multipath configs and also removing the second cable later on.
with all drives attached I was about to use the "Volume Import" option on the Storage page. But because of the multiparth problems, it wasn't actually working right.
I ended up deleting the volume imported and trying to re-configure a zpool with 6 mirror vdevs. seemed to be working fine. of course, I can't reboot the system since it would never boot up.
I wonder if there is any way to extend the SCSI command timeout?
Have I shared enough info here? (Sorry -- I'm really tired, so it's hard to write a good precis of the investigation)
Regards