Shared Drives on the same blackplane

JJ Hardwarez

Cadet
Joined
Mar 15, 2023
Messages
1
I have a Dell SC 4020 i converted in to a truenas server. The SC 4020 comes with LSI 2308 SAS RAID controllers. The SC 4020 has two modules (PC's ) that share the same black plane and drives so when module fails the other is alreay up and nothing lost.

If i run 1 module or the other module, both units by themselves come up fine.. storage pool up and access to the files. If i run both of them at the same time, I dont get any error messages persay in the events dmesg.. but 1 controller has access to the pool, and the other dosent. I also get amber lights on the drives that should be "shared" that they are offline.. but the other module has access to them and are functioning correctly.

below is some photos of what I have going on.. I also changed the NIC cards from 2x 10g sfp to a 40g QSFP card.

thing flys !
20230315_094609.jpg


20230315_094615.jpg
Drives showing error (amber) but totally accessible from the 1st booted up module (PC)

Capture1.PNG
Shows Store-A up and running

Capture2.PNG
Drive List




Capture4.PNG
This is Module 2 ( PC ) Store-A no mounted and showing offline


Capture3.PNG
The Drive list also disks are absent..

I did flash the newer 2008ET / IT firmware into both controllers.

Is this an option on the controllers or am i missing something ?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Hello JJ,

Your ZFS pool is active on "Module 1" and as such isn't mountable on "Module 2."

Shared-storage HA requires significant platform customization to make it reliable, and is only available on iXsystems-supplied TrueNAS appliances (like the M-series) with TrueNAS Enterprise - it's not available in TrueNAS CORE.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
Shared-storage HA requires significant platform customization to make it reliable, [..]
In other words: There is a high risk of actually reducing availability a lot, if you don't know exactly what you are doing. After all, HA setups add a considerable amount of complexity. Complexity as such adds risk and therefore statistically reduces availability. So the measures that cause this additional complexity, need to be pretty comprehensive for the overall solution to be better than something simpler.

My experience in the enterprise space (although more with transaction systems and not storage) has consistently been that people try to avoid clustering and complex HA setups, if that is possible. A horizontal scaling of web servers with a load balancer is straightforward (at least until we have sticky sessions), but something like Oracle RAC is a different beast. But I digress ...
 
Top