Sun ZFS Storage 7320

fr33n4s · Sep 22, 2021

Hi everyone,
I'm running TrueNAS-12.0-U5.1 on a Sun ZFS Storage 7320 with a 24 disks shelf installed.
The appliance is powered by an Intel Xeon E5620 (2.40GHz) and has 24GB of RAM.
I'm running a raidz3 where I see no issues whatsoever:

Code:

# zpool status
  pool: DES-Z
 state: ONLINE
config:

    NAME                                            STATE     READ WRITE CKSUM
    DES-Z                                     ONLINE       0     0     0
      raidz3-0                                      ONLINE       0     0     0
        gptid/f1221ad0-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0
        gptid/f10a2d68-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0
        gptid/f176d2fb-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0
        gptid/f2078faa-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0
        gptid/f275d32e-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0
        gptid/f2b469bf-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0
        gptid/f253433c-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0
        gptid/f2ee3dd0-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0
        gptid/f2edeaab-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0
        gptid/f2cdd4c8-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0
        gptid/f33c6536-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0
        gptid/f3140207-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0
        gptid/f32a2a8a-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0
        gptid/f3d74425-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0
        gptid/f36fe747-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0
        gptid/f3cd4913-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0
        gptid/f5323603-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0
        gptid/f629098d-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0
        gptid/f5e962f8-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0
        gptid/f655d119-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0
        gptid/f6556e89-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0
        gptid/f66742d9-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0
        gptid/f6b9c9c1-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0
        gptid/f68c6b05-1a32-11ec-b218-002128f03bd4  ONLINE       0     0     0

errors: No known data errors

  pool: boot-pool
 state: ONLINE
config:

    NAME        STATE     READ WRITE CKSUM
    boot-pool   ONLINE       0     0     0
      da49p2    ONLINE       0     0     0

errors: No known data errors

Unfortunately after some usage the "whole thing" starts being not responsive. Commands given in the shell (the zpool itself) get stuck forever without releasing any output. I've seen in the logs the following and I'm not quite sure what can be the issue considering that the disks are OK, I've two identical appliances and I've tried them both with the same result, I've already tried to disable the S.M.A.R.T. on any disk:

Code:

(0:4:0/1): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00                                                                   
(0:4:0/1): Tag: 0x1000001a, type 1                                                                                                     
(0:4:0/1): ctl_process_done: 123 seconds                                                                                               
(0:4:0/1): MODE SENSE(6). CDB: 1a 00 08 00 04 00                                                                                       
(0:4:0/1): Tag: 0x10000052, type 1                                                                                                     
(0:4:0/1): ctl_process_done: 92 seconds                                                                                               
ctl_datamove: tag 0x1000003c on (0:4:0) aborted                                                                                       
ctl_datamove: tag 0x10000038 on (0:4:0) aborted                                                                                       
ctl_datamove: tag 0x1000003e on (0:4:0) aborted                                                                                       
ctl_datamove: tag 0x1000003f on (0:4:0) aborted                                                                                       
ctl_datamove: tag 0x10000011 on (0:4:0) aborted                                                                                       
ctl_datamove: tag 0x10000014 on (0:4:0) aborted                                                                                       
ctl_datamove: tag 0x10000013 on (0:4:0) aborted                                                                                       
ctl_datamove: tag 0x10000015 on (0:4:0) aborted                                                                                       
ctl_datamove: tag 0x10000016 on (0:4:0) aborted
ctl_datamove: tag 0x10000017 on (0:4:0) aborted
ctl_datamove: tag 0x10000018 on (0:4:0) aborted
ctl_datamove: tag 0x10000019 on (0:4:0) aborted
ctl_datamove: tag 0x1000006e on (0:4:0) aborted
ctl_datamove: tag 0x1000000c on (0:4:0) aborted
ctl_datamove: tag 0x1000001e on (0:4:0) aborted
ctl_datamove: tag 0x1000000b on (0:4:0) aborted
(0:4:0/1): WRITE(10). CDB: 2a 00 01 38 3c e0 00 00 98 00
ctl_datamove: tag 0x1000001f on (0:4:0) aborted
(0:4:0/1): Tag: 0x1000003c, type 1
(0:4:0/1): ctl_process_done: 215 seconds
(0:4:0/1): WRITE(10). CDB: 2a 00 02 e7 f8 00 00 00 28 00
(0:4:0/1): Tag: 0x10000038, type 1
(0:4:0/1): ctl_process_done: 215 seconds
(0:4:0/1): WRITE(10). CDB: 2a 00 02 12 23 d8 00 00 08 00
ctl_datamove: tag 0x10000042 on (2:4:0) aborted
(0:4:0/1): Tag: 0x1000003e, type 1
(0:4:0/1): ctl_process_done: 215 seconds                                                                                     [414/1701]
(0:4:0/1): WRITE(10). CDB: 2a 00 02 12 23 d0 00 00 08 00
ctl_datamove: tag 0x1000001b on (1:4:0) aborted
(0:4:0/1): Tag: 0x1000003f, type 1
(0:4:0/1): ctl_process_done: 215 seconds
(0:4:0/1): WRITE(10). CDB: 2a 00 02 12 23 c8 00 00 08 00
(0:4:0/1): Tag: 0x10000011, type 1
(0:4:0/1): ctl_process_done: 215 seconds
(0:4:0/1): WRITE(10). CDB: 2a 00 02 12 23 c0 00 00 08 00
(0:4:0/1): Tag: 0x10000014, type 1
(0:4:0/1): ctl_process_done: 215 seconds
(0:4:0/1): WRITE(10). CDB: 2a 00 01 38 3d a8 00 00 08 00
(0:4:0/1): Tag: 0x10000013, type 1
(0:4:0/1): ctl_process_done: 215 seconds
(0:4:0/1): WRITE(10). CDB: 2a 00 01 38 3d a0 00 00 08 00
(0:4:0/1): Tag: 0x10000015, type 1
(0:4:0/1): ctl_process_done: 215 seconds
(0:4:0/1): WRITE(10). CDB: 2a 00 01 38 3d 98 00 00 08 00
(0:4:0/1): Tag: 0x10000016, type 1
(0:4:0/1): ctl_process_done: 215 seconds
(0:4:0/1): WRITE(10). CDB: 2a 00 01 38 3d 90 00 00 08 00
(0:4:0/1): Tag: 0x10000017, type 1
(0:4:0/1): ctl_process_done: 215 seconds
(0:4:0/1): WRITE(10). CDB: 2a 00 01 38 3d 88 00 00 08 00
(0:4:0/1): Tag: 0x10000018, type 1
(0:4:0/1): ctl_process_done: 215 seconds
(0:4:0/1): WRITE(10). CDB: 2a 00 01 38 3d 80 00 00 08 00
(0:4:0/1): Tag: 0x10000019, type 1
(0:4:0/1): ctl_process_done: 215 seconds
(0:4:0/1): WRITE(10). CDB: 2a 00 01 38 3d 78 00 00 08 00
(0:4:0/1): Tag: 0x1000006e, type 1
(0:4:0/1): ctl_process_done: 215 seconds
(0:4:0/1): WRITE(10). CDB: 2a 00 00 60 a4 80 00 01 88 00
(0:4:0/1): Tag: 0x1000000c, type 1
(0:4:0/1): ctl_process_done: 215 seconds
(0:4:0/1): WRITE(10). CDB: 2a 00 01 38 3d b0 00 00 70 00
.......

Any hint would be very appreciated considering that I've tested TrueNAS on a Sun ZFS Storage 7120 as well and I've got the same issue. Thought was the controller I've moved to the 7320(s) but apparently that's not it. I'm exporting Datasets via NFS and a few Zvol(s) via iSCSI and everything works ..... for a while.

Thanks.

P.S. fastest way to reproduce the issue is launching the scrub.

Arwen · Sep 22, 2021

The first hint is your RAID-Z3 vDev is WAY TOO WIDE. Meaning, you should have broken up the disks into several RAID-Z2, (or -Z3), vDevs. Performance starts to suck ALOT when the pool starts to get full.

As a general rule, 8 to 12 disks in a vDev should be the maximum for RAID-Z2 or -Z3. (DRAID would be different, and does not apply here.) Some people think 10 to 12 disks is too much. But, there are plenty of rack mount servers with 12 disk slots, so it becomes how much space is needed, and what trade offs you can make.

With 24 disks, you could have 2, 12 disk RAID-Z2 or -Z3 vDevs. Still a bit wide. Or you could do 3, 8 disk RAID-Z2 or -Z3 vDevs, which would be much better.

fr33n4s · Sep 23, 2021

Hi Arwen,
I wasn't aware of such limitation. I'll do some tests.

Thanks a lot for your reply.

fr33n4s · Sep 23, 2021

Hi again, I've split the total into 3 RAIDZ2 pools, 8 disks each as suggested but it starts being unresponsive again with plenty of the following:

Code:

Device /dev/gptid/eee8dbb9-1c58-11ec-a7eb-002128f04858 is causing slow I/O on pool ZLAB-3.

and not only on ZLAB-3 but on ZLAB-1 too which are the only pools in use right now.
The first is exporting a a dataset via NFS, the second a zvol via iSCSI (which is a proxmox VM disk, for the record).

Arwen said:
The first hint is your RAID-Z3 vDev is WAY TOO WIDE. Meaning, you should have broken up the disks into several RAID-Z2, (or -Z3), vDevs. Performance starts to suck ALOT when the pool starts to get full.

As a general rule, 8 to 12 disks in a vDev should be the maximum for RAID-Z2 or -Z3. (DRAID would be different, and does not apply here.) Some people think 10 to 12 disks is too much. But, there are plenty of rack mount servers with 12 disk slots, so it becomes how much space is needed, and what trade offs you can make.

With 24 disks, you could have 2, 12 disk RAID-Z2 or -Z3 vDevs. Still a bit wide. Or you could do 3, 8 disk RAID-Z2 or -Z3 vDevs, which would be much better.

jgreco · Sep 23, 2021

fr33n4s said:
Hi again, I've split the total into 3 RAIDZ2 pools, 8 disks each as suggested but it starts being unresponsive again with plenty of the following:

Code:
Device /dev/gptid/eee8dbb9-1c58-11ec-a7eb-002128f04858 is causing slow I/O on pool ZLAB-3.

and not only on ZLAB-3 but on ZLAB-1 too which are the only pools in use right now.
The first is exporting a a dataset via NFS, the second a zvol via iSCSI (which is a proxmox VM disk, for the record).

Using RAIDZ for block storage is a super-bad idea, though it normally wouldn't result in a stall.

Please do refer to this article: https://www.truenas.com/community/threads/the-path-to-success-for-block-storage.81165/

Stalls or hangs tend to end up as hardware issues, which could be failing disks, but could also be an incompatible controller. I don't really know what Sun used for disk attachment in their ZFS appliances, so if you could provide a little more insight into what the system is seeing, such as the content of /var/run/dmesg.boot, there could be some useful hints there. Solaris supported a lot of "enterprise" I/O devices that FreeBSD may not be supporting as well, and I can easily see the use of some unusual HBA as being a component to your issue.

fr33n4s · Sep 23, 2021

jgreco said:
Using RAIDZ for block storage is a super-bad idea, though it normally wouldn't result in a stall.

Please do refer to this article: https://www.truenas.com/community/threads/the-path-to-success-for-block-storage.81165/

Stalls or hangs tend to end up as hardware issues, which could be failing disks, but could also be an incompatible controller. I don't really know what Sun used for disk attachment in their ZFS appliances, so if you could provide a little more insight into what the system is seeing, such as the content of /var/run/dmesg.boot, there could be some useful hints there. Solaris supported a lot of "enterprise" I/O devices that FreeBSD may not be supporting as well, and I can easily see the use of some unusual HBA as being a component to your issue.

Hi jgreco,
please find attached /var/run/dmesg.boot. And yes, I would agree with performance issues but it gets stuck badly and there is no way to even reboot because 1 process gets always stuck (didn't identify it yet but it might be "init" itself) and have to power it off.

It actually says on syslog: "init 1 -- some processes would not die: ps axl advised"
Useless to say that at that point there is no way to execute the suggested command.

And it happens on 3 different appliances which makes me think of some incompatibility issue being more likely than a hardware related one:
2 Sun ZFS Storage 7320 connected to the same disk shelf (not at the same time)
1 Sun ZFS Storage 7120, stand alone.

Thanks for your time.

jgreco · Sep 23, 2021

Yeah, you have a PMC Sierra card of some sort in there, these are not expected to work well, and the few other people who have shown up here in the forums with these had problems too.

Please replace it with an LSI HBA crossflashed to IT firmware 20.00.07.00 and it will stop stalling.

Refer to https://www.truenas.com/community/r...bas-and-why-cant-i-use-a-raid-controller.139/

For the purposes of this discussion, your HBA is covered under points #2, #3, and #4 of that article.

Arwen · Sep 23, 2021

fr33n4s said:
Hi again, I've split the total into 3 RAIDZ2 pools, 8 disks each as suggested but it starts being unresponsive again with plenty of the following:

Code:
Device /dev/gptid/eee8dbb9-1c58-11ec-a7eb-002128f04858 is causing slow I/O on pool ZLAB-3.

and not only on ZLAB-3 but on ZLAB-1 too which are the only pools in use right now.
The first is exporting a a dataset via NFS, the second a zvol via iSCSI (which is a proxmox VM disk, for the record).

It's possible that a single disk device is causing part of the problem. Normally the disk error recovery timeout is set to something reasonable for ZFS, like 7 seconds. Desktop drives on the other hand can be more than 1 minute, PER BAD BLOCK RECOVERY. With ZFS redundancy, having the disk attempt extreme recovery of a bad block is both not needed. And can cause exactly the issue you have described in the quoted section above.

Most vendors have a way to limit error recovery time. Some call it TLER, (Time Limited Error Recovery). I think Seagate uses some other wording, but the result is the same.

This time limited error recovery is one reason why we, (in the TrueNAS forums), tend to recommend NAS disks. They generally come with the appropriate lower time setting by default. While desktop drives either have to be changed, (either once on installation or at each boot). Or CAN'T be changed at all.

All that said about time limited error recovery and potentially a bad disk, I don't know those are part of your problem. Just something else to investigate.

Yes, I agree with @jgreco about using RAID-Zx with iSCSI & zVols. Mirrored vDevs are a better choice. The slight exception is lower use VMs & iSCSI, that can tolerate slower responses.

So, with my suggestions & @jgreco, we have:

Reducing the width of RAID-Zx vDevs
Replacing PMC Sierra card with an LSI or variant
Possibly using Mirrored vDevs for iSCSI storage, and a separate pool of RAID-Zx for bulk storage
Checking / setting the TLER to lower value, like 7 seconds
Potentially replacing the one disk reporting slow I/O

That should get you things to try.

fr33n4s · Sep 24, 2021

Arwen said:
It's possible that a single disk device is causing part of the problem. Normally the disk error recovery timeout is set to something reasonable for ZFS, like 7 seconds. Desktop drives on the other hand can be more than 1 minute, PER BAD BLOCK RECOVERY. With ZFS redundancy, having the disk attempt extreme recovery of a bad block is both not needed. And can cause exactly the issue you have described in the quoted section above.

Most vendors have a way to limit error recovery time. Some call it TLER, (Time Limited Error Recovery). I think Seagate uses some other wording, but the result is the same.

This time limited error recovery is one reason why we, (in the TrueNAS forums), tend to recommend NAS disks. They generally come with the appropriate lower time setting by default. While desktop drives either have to be changed, (either once on installation or at each boot). Or CAN'T be changed at all.

All that said about time limited error recovery and potentially a bad disk, I don't know those are part of your problem. Just something else to investigate.

Yes, I agree with @jgreco about using RAID-Zx with iSCSI & zVols. Mirrored vDevs are a better choice. The slight exception is lower use VMs & iSCSI, that can tolerate slower responses.

So, with my suggestions & @jgreco, we have:

Reducing the width of RAID-Zx vDevs

Replacing PMC Sierra card with an LSI or variant

Possibly using Mirrored vDevs for iSCSI storage, and a separate pool of RAID-Zx for bulk storage

Checking / setting the TLER to lower value, like 7 seconds

Potentially replacing the one disk reporting slow I/O

That should get you things to try.

Hi again,
replying briefly to your suggestions:

I've already split the initial big vdev into 3 8-disks ones and didn't fix that
I'm checking if the hardware is somehow compatible with the LSI HBA and identifying where the card exactly
I'm using the iSCSI exactly for the VMs disks and it should work even if not performing great. I will go more in depth about the raidz/iscsi based on your suggestions (thanks for that)
About the TLER, these are already NAS disks, not very recent but high end, so I'm not sure that such setting would have inappropriate values (assumption)
The disks, when the issue starts presenting, start to report slow I/O one after another as a result of some cascading effect but there is no issue with the disks themselves

According to the Sun/Oracle documentation the replaceable PCIe cards in the 7320 storage are the following:

Part Number	Description	FRU/CRU
F371-4325-01	8Gb FC HBA (PCIe)	CRU
F375-3609-02	PCA, SAS 6GBS 8 Port (PCIe)	CRU
F375-3606-03	Dual Port (x4) IB HCA (PCIe)	CRU
F375-3696-01	Dual Port CX2 4XQDR (PCIe)	CRU
F375-3617-01	2X10GbE SFP+, X8 (PCIe)	CRU
F375-3481-01	NIC Card Quad Port 1GigE Cu (PCIe)	CRU
F511-1496-04	Sun Fishworks Cluster Controller 200 (PCIe)	FRU

I cannot find an exact correspondence for the PCM Sierra but I think that should be the second from the top (F375-3609-02). Is it?
And referring to the jgreco's info: "crossflashed to IT firmware 20.00.07.00". Can you articulate just a little bit more please?
I'm really not a storage expert so for me flashing a firmware means something I've done many times but crossflashing a storage HBA is something that sounds rather different to me. Moreover I can find these cards on ebay, some already marked "for TrueNAS" but I cannot verify the firmware on the item; assuming we are talking about the same firmware here.

Thank you very much to you and @jgreco for your exhaustive replies! Really appreciated.

jgreco · Sep 24, 2021

LSI actually made a "lowest tier" RAID controller (9211) that was only capable of running their IR (most basic RAID and HBA) and IT (pure HBA) firmwares, and allowed you to switch between them with a simple firmware reflash. These cards are actually rather hard to find.

LSI's better controllers use the same general overall architecture, but started to add features, so, for example, the LSI 9240 added in stuff like RAID5 using the MFI driver/firmware stack. These were sold in vast quantity, usually relabeled by a vendor (Dell PERC H200/H310, IBM ServeRAID M1015, etc). Because they use the MFI firmware, they are unsuitable for FreeNAS/TrueNAS, because the MFI stack is just a wee bit flaky, among other things. However, the 9240/H200/M1015/etc can have their firmware wiped, the card electronically fumigated to have the signature of a 9211, and then forcibly have early-gen 9211 firmware loaded on them, at which point they look ~98% like a true 9211, and can be upgraded to modern 9211 IT firmware. This is only about a 98% conversion, because you usually cannot flip back and forth from IR to IT like you could with a true 9211.

The trick is, Dell and IBM and friends sold these controllers in MILLIONS of servers, so they are cheaply available on the used market for maybe $30-$40. But you have to understand an arcane process and waste a bunch of time setting up the correct toolchain and going through several reboot cycles to do the crossflash, and it does not work the same on every PC due to differences in BIOS/EFI stuff. This may be fine for a hobbyist who doesn't mind the challenge and ambiguous possibility of success, but some enterprising eBayers came to recognize that profit was possible by charging people an extra $10-$15 for an already-crossflashed unit.

There are some recognized sellers on eBay who have apparently sold a LOT of these to the FreeNAS community, and who have provided tech support when things go sideways. In my opinion, if you are not interested in drama, it's worth the modest additional cost. Most of the people here are not me and do not have an extensive workshop with a bench PXE environment that includes "all the tools" to make this easier, and, heck, even with that advantage, I find crossflashing a sometimes-exasperating process. Just be sure to research your seller here on these forums before you buy.

fr33n4s · Sep 24, 2021

jgreco said:
LSI actually made a "lowest tier" RAID controller (9211) that was only capable of running their IR (most basic RAID and HBA) and IT (pure HBA) firmwares, and allowed you to switch between them with a simple firmware reflash. These cards are actually rather hard to find.

LSI's better controllers use the same general overall architecture, but started to add features, so, for example, the LSI 9240 added in stuff like RAID5 using the MFI driver/firmware stack. These were sold in vast quantity, usually relabeled by a vendor (Dell PERC H200/H310, IBM ServeRAID M1015, etc). Because they use the MFI firmware, they are unsuitable for FreeNAS/TrueNAS, because the MFI stack is just a wee bit flaky, among other things. However, the 9240/H200/M1015/etc can have their firmware wiped, the card electronically fumigated to have the signature of a 9211, and then forcibly have early-gen 9211 firmware loaded on them, at which point they look ~98% like a true 9211, and can be upgraded to modern 9211 IT firmware. This is only about a 98% conversion, because you usually cannot flip back and forth from IR to IT like you could with a true 9211.

The trick is, Dell and IBM and friends sold these controllers in MILLIONS of servers, so they are cheaply available on the used market for maybe $30-$40. But you have to understand an arcane process and waste a bunch of time setting up the correct toolchain and going through several reboot cycles to do the crossflash, and it does not work the same on every PC due to differences in BIOS/EFI stuff. This may be fine for a hobbyist who doesn't mind the challenge and ambiguous possibility of success, but some enterprising eBayers came to recognize that profit was possible by charging people an extra $10-$15 for an already-crossflashed unit.

There are some recognized sellers on eBay who have apparently sold a LOT of these to the FreeNAS community, and who have provided tech support when things go sideways. In my opinion, if you are not interested in drama, it's worth the modest additional cost. Most of the people here are not me and do not have an extensive workshop with a bench PXE environment that includes "all the tools" to make this easier, and, heck, even with that advantage, I find crossflashing a sometimes-exasperating process. Just be sure to research your seller here on these forums before you buy.

I've got the necessary info to move forward now. And yes, much better to spend a little more than going through excruciating processes with a certain margin of failure as well. I have no further time to waste/invest on this unfortunately.

Thanks a lot.

fr33n4s · Sep 24, 2021

Hello again, I was too enthusiastic. :(
I've just realized that finding such HBA with "molex cables" (at least this is how we generally call them) it isn't a trivial task. I'm attaching a few pictures of my HBA (just removed from the appliance) and the aforementioned cable.

jgreco · Sep 24, 2021

fr33n4s said:
Hello again, I was too enthusiastic. :(
I've just realized that finding such HBA with "molex cables" (at least this is how we generally call them)

Molex makes a crapton of connectors, so that's a horrible thing to call them... ;-)

it isn't a trivial task.

It actually is. Please head on over to

https://www.truenas.com/community/r...-be-sas-sy-a-primer-on-basic-sas-and-sata.48/

the SAS Primer. They're SFF8088 external SAS cables. You can get an LSI HBA such as an -8e model, or you can get an -8i model which has SFF8087 and use a paddleboard.

https://www.ebay.com/itm/163534822734

I recognize this seller as one frequently mentioned here on the forums with good results. Not an endorsement, just sayin', that's what I expect you need.

I'm attaching a few pictures of my HBA (just removed from the appliance) and the aforementioned cable.

mikkii · Dec 14, 2021

hello fr33n4s

Same problem was on my Sun ZFS Storage 7320 controller (Sun Fire X4170 M2 Server) with a 24 SUN disks shelf installed.
The appliance is powered by an Intel Xeon E5620 (2.40GHz) and has 96GB of RAM.
Sun developed it as a storage controller (with ZFS Storage Appliance). Not server.

I tried to install many (over 20) OS, including some Oracle Solaris releases.
Oracle Solaris warns about upgrading server firmware, spam IRQ problems and hangs after installation.
Oracle Linux, Debian, VMware ESXi and many others - hangs on installation.
Ubuntu 20.04 installs and works only if SAS controller removed.
TrueNAS, FreeNAS hangs after 2..24 hours, even without connecting Sun/HP/Huawei shelves.

The only one OS works fine on my server - Ubuntu 18.04.4 LTS (sun shelf works in ZFS, two years passed without any problems).
SATA on 7320 must be configured as IDE&Compatible!

jgreco · Dec 14, 2021

Yes, Sun had a tendency to use custom silicon for I/O devices. It is common to need to replace ethernet and/or HBA cards.

fr33n4s · Dec 20, 2021

mikkii said:
hello fr33n4s
Same problem was on my Sun ZFS Storage 7320 controller (Sun Fire X4170 M2 Server) with a 24 SUN disks shelf installed.
The appliance is powered by an Intel Xeon E5620 (2.40GHz) and has 96GB of RAM.
Sun developed it as a storage controller (with ZFS Storage Appliance). Not server.

I tried to install many (over 20) OS, including some Oracle Solaris releases.
Oracle Solaris warns about upgrading server firmware, spam IRQ problems and hangs after installation.
Oracle Linux, Debian, VMware ESXi and many others - hangs on installation.
Ubuntu 20.04 installs and works only if SAS controller removed.
TrueNAS, FreeNAS hangs after 2..24 hours, even without connecting Sun/HP/Huawei shelves.

The only one OS works fine on my server - Ubuntu 18.04.4 LTS (sun shelf works in ZFS, two years passed without any problems).
SATA on 7320 must be configured as IDE&Compatible!

Fortunately you posted a reply because I forgot to update the result of my purchase!
Following the suggestion of @jgreco and after a few further checks I've purchased:

jgreco said:
https://www.ebay.com/itm/163534822734

Installed on the NAS and reinstalled and reconfigured TrueNAS from scratch and that was the final solution. Now it's been up and running for about two months with no problem whatsoever!

Hope this helps you too @mikkii .
Cheers and merry Xmas to you all.

jgreco · Dec 20, 2021

fr33n4s said:
Cheers and merry Xmas to you all.

To you as well.

Important Announcement for the TrueNAS Community.

Sun ZFS Storage 7320

Cadet

MVP

Cadet

Cadet

Resident Grinch

Cadet

Attachments

Resident Grinch

MVP

Cadet

Resident Grinch

Cadet

Cadet

Attachments

Resident Grinch

Cadet

hello fr33n4s​

Resident Grinch

Cadet

hello fr33n4s​

Resident Grinch

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Sun ZFS Storage 7320"

Similar threads

hello fr33n4s

hello fr33n4s