DIY all flash/SSD NAS (CrazyNAS issues)

Allan_M

Explorer
Joined
Jan 26, 2016
Messages
76
That doesn't sound right. All X10 boards I know use the same ASpeed 2400 with AMI MegaRAC plus Supermicro skin solution, and all X10s (including the Avoton/Rangeley series A1 boards) have the same BMC firmware.

Could it be that some of them are just not on the latest firmware version?

Could very well be.

X10SLL
X10SDV

The latter, I can use HTML
The former, I have to use the iOS app in macOS or some ancient Java-applet thing in Windows
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
Filling / populating one backplane with SSDs, create a stripe, Rz1 or Rz2 and create a pool.
Fill another backplane, create a stripe/Rz1/Rz2, add vdev to pool.
And so on.

In that way, I'd be able to keep adding vdevs to the pool.

I'm not that concerned with redundancy for this project.
I'd aim to replicate the 'holy grail' of redundancy, that is to spread vdevs across backplanes. (Think bigger, per cable, per backplane, per shelf, per rack) etc... super spread out. Also it would essentially mean you are "striping" bandwidth of your controller channels. I think that would be a cool part to this project too.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
I'd aim to replicate the 'holy grail' of redundancy, that is to spread vdevs across backplanes
If you do that and you lose an entire backplane (think power supply in one particular shelf) ...
I'd rather lose a vdev, have the pool offlined/suspended instantly, then perform a scrub after everything is back online.

Of course if you calculate redundancy so you can afford losing an entire shelf, go for it.
 
Last edited:

NickF

Guru
Joined
Jun 12, 2014
Messages
763
There are certainly performance characteristic differences with differing mirror topologies based on variables like PCI-E bandwidth, SAS expander bandwidth, and also things like memory bandwidth and CPU uncore performance. I think its makes more sense to mirror drives between backplanes, port 0 on backplane 0 and port 0 on backplane 1, etc

Losing an entire shelf and still having your pool up is certainly a common design choice for SANs in general, if not TN specifically.
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
I'd rather lose a vdev, have the pool offlined/suspended instantly, then perform a scrub after everything is back online.
This is a fair point.

Of course if you calculate redundancy so you can affort losing an entire shelf, go for it.
That was a long the lines I was thinking. Worth pointing out there is a calculation and a balance here to keep in mind. Without real caution, one may end up in a worse situation. For example, where spreading vdevs "but not ontop of enough points of failure" that the situation actually gets even worse. In the case of a Z2 6wide, on top of 4 backplanes would having one backplane go out, would either mean 0 redundancy left or the vdev going offline/suspending the pool. Such scenarios needs to be addressed.
One need to be critically aware <against what failure> a certain layout is adding protection for, or as per above example, adding "yoloness".

At any rate, it is a super cool thought experiment.
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
We experience in sooo many other instances (I also do photo and video, so, yeah, a lot), and after being promised something for free (high end server performance and feature sets), but now suddenly having to pay for stability, ease of use and so on.​
To put my point of view on your tangent, the possibly-hypothetical-but-so-true "need $2k MB + 1 TB RAM" is akin to the self-proclaimed definitive expert explaining you cannot possibly do ANY good photography without the latest top-of-the-line from (either!) Canon or Nikon—totally ignore, unless you're specifically in the business of shooting sports events at 40+ fps. Turning the excessively old Athlon into a NAS would be akin to picking a SLR from a 70s (or a Yashica Mat 124G, or a screwmount Leica…)—possible as a hobby but unlikely to be dependable and could run into maintenance issues. And your X10SDV embedded board is the equivalent of a Fujifilm X100V (or, accounting for its age, a Konica Hexar RF)—a not universal but highly capable professional solution in a small package.

Also - this quote goes to my little hall of fame of fantastic quotes (in signature) :))
I do not think my quote is anywhere near the giants in your hall of fame (with special mentions to @anodos and @Ericloewe…), but thanks.
 

Allan_M

Explorer
Joined
Jan 26, 2016
Messages
76
... so, almost a year later: Progress!

Skærmbillede 2024-02-24 kl. 22.17.35.png


Was doing some tidying up and decided to give it another try.

I have absolutely no idea, but I must have done something different.

Followed the same guide.
Downloaded til files, from the same links.

Decided to skip a step, after doing some googling around, and pushed on.

... this splash (attached screenshot) looks much more like the one from my TrueNAS-box. See the old splash below.

426672554_923052502642476_6138027305333182513_n.png


I never did understand, why it said "ServeRAID" and presented me with a list of JBOD(s). It was supposed to be flashed in IT-mode, when I bought it.

After this minor succes I'm hesitant to push my luck, but I wan't to see if the damn thing now recognizes more than 58 drives.

However. After my last attempt I packed everything down and put it on the literal shelf in my small office. Didn't mind to sort the drives.

Soo... I guess, I now have to check every single drive individually to see which drives was in use (those, I updated the firmware on) and those that wasn't.

I'm not declaring CrazyNAS to be back on track, but this does reinvigorate my desire to follow through on the project.

Stay tuned! :cool:
 

Koop

Explorer
Joined
Jan 9, 2024
Messages
59
I believe even when flashed into IT mode it will still retain the bios which may be what you're seeing. I recall when I got my Supermicro HBA (which came with a nice utility and step by step directions so easy...) it flashed it to IT mode as well as removed the bios. So now whenever I boot I don't even see anything related to my HBA pop up. Before that when I'd boot into whatever I'd see it poo up and have the ability to log into it yada yada... Something like that maybe, I'm just guessing. I'm a noob.

And I'm enthralled by this story.

Please keep crazyNAS alive.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
It is 2024. You should not be seeing anything tied to legacy BIOS, especially not Real Mode Option ROMs with their god-awful "press F to pay respects" prompts.

Disable all BIOS Option ROMs in system firmware, disable all CSM crap, boot exclusively from UEFI, etc. If you need to configure an HBA or a NIC or whatever, do so using the plug-in menus provided by the UEFI Option ROMs that are so neatly integrated with the rest of the system firmware setup application.
 

Koop

Explorer
Joined
Jan 9, 2024
Messages
59
with their god-awful "press F to pay respects" prompts.

Oh that's right! I remember now because my x11 board defaulted to dual boot mode by default after I updated the bios and that's probably why I saw it.. Then I realized if you follow the CORRECT procedure/script as provided by SM for their flavor of the LSI HBA and it says "respect this" and blows it away.

I'll miss that lil guy..making my boot tike all the slightly longer.

But yeah. You can get rid of that stuff for sure like @Ericloewe said. And then wipe it clean off the HBA too.

Anyway..back to staying tuned.
 

nabsltd

Contributor
Joined
Jul 1, 2022
Messages
133
If you need to configure an HBA or a NIC or whatever, do so using the plug-in menus provided by the UEFI Option ROMs that are so neatly integrated with the rest of the system firmware setup application.
Even on an HBA that has no real config compared to a RAID card, the UEFI interface is terrible compared to the BIOS one provided by the card.

For cards that have complex config, (like RAID cards, or NICs with QSPF+ that can be split into multiple 10Gbit SPF+ with a breakout cable), the BIOS interface is almost always better.

That said, this likely only applies to older cards. Something newer that was designed with UEFI in mind might have a reasonable UI.
 

nabsltd

Contributor
Joined
Jul 1, 2022
Messages
133
I'll miss that lil guy..making my boot tike all the slightly longer.
The UEFI initialization takes almost exactly the same amount of time, but doesn't provide you with any messages to let you know that things are going as planned.

There's a message that mentions AHCI when you are booting in UEFI mode...that stays visible a lot longer when you have a LSI card installed.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Even on an HBA that has no real config compared to a RAID card, the UEFI interface is terrible compared to the BIOS one provided by the card.
What? They're basically the same thing. Same options, different styling. This applies to LSI SAS2 and SAS3 HBAs, at least.
For cards that have complex config, (like RAID cards, or NICs with QSPF+ that can be split into multiple 10Gbit SPF+ with a breakout cable), the BIOS interface is almost always better.
I have not tested RAID controllers, but on every Intel and Broadcom NIC I've seen I have not had any reason to complain about missing functionality. I don't have any QSFP+/QSFP28 NICs to test with, but given that options exist for SR-IOV, PXE and other things, I cannot imagine a scenario where the breakout options are not available (assuming they have to be configured before the OS takes over).
That said, this likely only applies to older cards. Something newer that was designed with UEFI in mind might have a reasonable UI.
Even things as old as the Intel 82599 or LSI SAS 2008 provide the same functionality or better, but integrated with the UEFI setup menu. I don't think older devices than those are going to have UEFI OpROMs available.
 

Koop

Explorer
Joined
Jan 9, 2024
Messages
59
The UEFI initialization takes almost exactly the same amount of time, but doesn't provide you with any messages to let you know that things are going as planned.

There's a message that mentions AHCI when you are booting in UEFI mode...that stays visible a lot longer when you have a LSI card installed.

Party pooper. But thanks for the info.
 

nabsltd

Contributor
Joined
Jul 1, 2022
Messages
133
I have not tested RAID controllers
This is where UEFI really falls short, because the UI is often placed in a frame inside the main navigation.

Imagine taking the TrueNAS UI, squash it into 1/2 to 2/3 of the screen, then navigate around to do everything necessary to create a pool from scratch.

For at least some HBAs, though, I have noticed that the UEFI interface doesn't always show everything, like max speed to negotiate on a link. And, help text is often far less complete in UEFI. You do have the current item short help display, but many LSI products have a "hit F1 for help" that calls up pages and pages of text when in BIOS mode.

Some of this could be a side effect of crappy UEFI UI implementation by the motherboard firmware, or less than quality porting of the BIOS ROM to UEFI by the card manufacturer.
 

Allan_M

Explorer
Joined
Jan 26, 2016
Messages
76
Hi again

I did post an update, but must have accidentally scrolled back, so that the page refreshed and everything I wrote was gone. After spending an awful lot of time on testing and playing around I didn't have the will to write it all again, so here I am with another update.

Just imagine I wrote this the very same evening. After the successful flashing of the HBA I did decide to push my luck - and I'm glad I did.

So, from the top:

426672554_923052502642476_6138027305333182513_n.png


Up and until recently I was under the impression, that my HBA was indeed flashed into IT-mode. I was wondering why it kept saying "MegaRAID" and presenting the drives as being individual JBOD(s) - but, what do I know?! I'm a total n00b at this and just another guy with to much time and to much interest ind fiddling around and not much sense.

However. I decided, for some reason - I was tidying up, let's go with that - to give it another go. Gathered the files from the guide but also read the comments to find clues as to why I was unsuccesful in flashing the damn thing in the first place.

Long story short - I managed to get it to work, by following the guide, as I had done before, but I must have done something different.

Now the loading/HBA/BIOS-thingie looks like this instead:
iKVM_capture (1).jpg


Soooo much more in line with what I expected - only tried one drive to begin with. But when I confirmed it worked - drive detected - I pushed on an started filling in the backplanes:

IMG_1092.jpg

(This image is from later in the process)

Oh. And I should mention, at some point, you'll notice I switched power supply, so don't get confused. It's called foreshadowing *uuhh hhuuu uhh*.

Of course, booted into Windows and wanted to see each drive detected live - hotswap, and all that. 22 drives...

iKVM_capture (2).jpg

(In case someone was wondering: It's a iKVM/HTML-thingie screenshot of Windows running on the SSD in the m.2 slot on the motherboard)

iKVM_capture (3).jpg


Too many drives to fit into one screenshot in the current resolution. How many, you ask...?

iKVM_capture (8).jpg

(Ignore the very obvious "64" number, which is... obviously, erhm, wrong)

68 drives!

I ran the numbers, and 68 drives is indeed more than the previous 58 drives. Actually, it's a 10 drive difference (or 2, that depends on how you count).

Trust me. I'm a math teacher. I should know how addition and subtraction works - but that's about it.

On the previous image (the one with all the drives on the backplanes) you can see 87 drives, I think.

Here is an image from DISKPART.

iKVM_capture (11).jpg

(Again, the number of drives don't match. Please ignore the numbers - not all of them, of course. Only the wrong ones)

Glourious! There is no other word.

Decided that this was as good a time as ever to do some tests. With +58 drives, and the BIOS/HBA-boot thingie looking right, I thought to myself, that it must work.

+80 drives was more than enough to do some testing. So without any other prep, I used Windows' Storage Spaces to create 'something' (I honestly can't remember, so bear with me) and ran a CrystalDiskMark:

iKVM_capture (17).jpg


I think I ran into some weird device limit. I clearly remember marking all the drives, but it refused to use them all (and this was after a tedious reset of all drives that I had previously used. I get, that you want the ability to repair a damaged array - I gues Windows thought I was a damaged array - from the drives I had previously used, which is fine. But I digress), and I think it was around 40 drives however I went about it.

40 x 128 GB ~ 5 TB

So I must have used some sort of parity, because the "JBOD"-option only gives a "single-drive" performance.

I know, 2 GBps / 1.7 GBps R/W aren't that impressive compared to newer drives. But... it's still kinda awesome!

Fast? Yes, but not the fastest. Far from it.
Practical? No, absolutely not!
Usable? To a degree
Faster than 10GbE? I would think so, yes?
Economical? Get out!

Windows Storage Spaces.jpg


Storage Spaces - can't remember the excact config.

Stripe 2-disk.jpg


2-disk stripe.

Stripe 4-disk.jpg


4-disk stripe.

Combined 2-disk.jpg


2-disk 'combined'.

Combined 4-disk.jpg


4-disk 'combined'.

Mirror 2-disk.jpg


2-disk mirror.

... soo. About that power supply...

IMG_1096.jpg


You'll notice, that on this image there is two (or three) things going on:
  • Power supply was changed from a 750W (older) to a 850W (newer).
  • Fewer drives (72 to be excact)
  • Two SATA-drives on the side (TrueNAS install)
I managed to run almost everything in Windows just fine. It was when I booted into TrueNAS I started having issues (not TrueNAS' fault, I'd might add).

I decided to go 'gung ho' and create one large vdev (87 drives).

Ticked all drives, added them to the pool, and TrueNAS began to initialize... aaaand.... *click!* and then nothing.

The computer just tapped out. Nothing.

So. What I suppose happened was, that I was running everything from the wall and sequentially adding drives. Didn't power off when 'soft rebooting'. I guess, the drives must have been 'filled' and charged, but when I started initializing they went in overdrive.

I've read somewhere that SATA-drives can pull upto 10-12 W when booting or in write heavy scenarios. That's the reason I added the drives sequentially. But, not that they'd draw a full 10-12W (or so I assume) when initializing.

Let's just imagine, that that's what they did. 87 drives pulled at least 10 W each.

Well. The math is easy on that one (good for me): 87 drives x 10 W / drive = 870 Watts.

The power supply was pulled from a pile of discarded workstation/server gear some years back. I haven't had any problems with it, but I suspect that +800 W was a bit too much for an old 750W (total!) PSU.

Had to remove drives, so that I could boot.

IMG_1098.jpeg


I'll spare you the details, but I'll admit this isn't the way to do it.

All four backplanes are connected via one (1!) single molex connector. The PSU is modular and I have only one cable with four molex-connectors. So everything is going through these wires... oh. What wires?

Well. You see the top connector? That's a cable with 5 SATA power connectors.

The large one on the bottom is the ATX-cable.

The cable with molex connectors is the one in the middle. Not the one in the rear. The one, with four little wires - that get's hot. Not dangerously - I should know, I also teach physics - but not in the good way.

The current protection (I assume) kicks in with only 24 drives connected.

Four backplanes with SAS-expanders and drives is simply too much for that one connection.

On the 750 W PSU I had two cables to spread the load.

So. Where am I at now?

Well. Tbh, the first post (the one, that got deleted) was me declaring the project - not dead - but on hiatus. I simply can't justify the cost of having the system running 24/7. Even at idle, we're talking 170W (I'll explain this number later).

I ran the numbers through a electricity calculator and it would total to around $700/yr for idle alone.

I don't know how well that number translates to people on this forum, so I've come up with some comparisons (for what around $700 would get you in Denmark):
  • SteamDeck 256 GB, IPS or ROG Ally Z1 Extreme
  • Refurbed M2 MacMini, base model
  • 2x DJI Pocket 2
  • 7x Disney Plus yearly subscribtions.
And that's for idle power alone.

At full tilt we're talking +700 W (I'll explain this as well), which is around $2,900 - in 'DK-prices':
  • Mac Studio, base model
  • DJI Mavic 3 Pro Fly More Combo + DJI RC
  • Blackmagic Pocket Cinema Camera 6K Pro (if you discount/subtract the added value of DaVinci Resolve Studio)
So. Practicality went out the window long ago. Just wanted to see, if it was possible and what results I'd might get. If it was "worth" it.

Well. Sorry to say. I have some more math.

Since I started this project some newer drives hit the marked - not much of a surprise, I'll admit, but let me explain.

24x 128 GB = 3,072 GB or around ~3TB. In a vdev with RaidZ1 or RaidZ2, that's about what you'll get. To make things easier, let's just assume around 3 TB pr. backplane.

4x 3TB = 12 TB total.

That's more or less the same as four 4 TB drives in RaidZ1.

I have an ASUS Hyper M.2 x 16 card V2 - to those who might not know: it's PCI-e card, that allows four individual m.2 form factor drives to be placed in a PCIe x16 slot. The motherboard must support bifurcation which basically means that the x16 lanes are split into individual x4 lanes, denoted as x4x4x4x4.

A decent 4 TB drive cost around DKK 2,000 (some are cheaper of course, but let's go with middle of the road, MVP'ish).

Four of those is around DKK 8,000 or $1,200 - (in DK that'll get you a SteamDeck 1TB, OLED with a nice case and sceen protector).

Only four drives, waaaaay less power usage, better performance. And, if I wanted to, I could add SATA-drives via the motherboard headers and expand with three or four large hard drives and use those as long term storage and only use the SSDs for current projects.

Is CraszyNAS dead on arrival?

No. Absolutely not. I'll see this through and buy cables just to see, what I can do with it and what kind of performance I can eeke out (please do suggest some RAID/vdev/pool-layouts). But, also, I have to admit it is a kinda 'Spruce Goose'-thing.

Before I'll sign of for now - I'll tease you with the last screenshot, before the thing powered off (with the 850 W power suplly).

426130861_822854439868622_8633157584487284691_n.png


... "45Drives"? That's cute.

How about 98 drives? :cool::tongue:

Oh. One last thing. About power usage. How did I get those numbers, when the thing powers of with all four backplanes filled?

Well. Did some tests with one and two backplanes filled (and a meter at the socket). I know it doesn't scale precisely linearly, but with two backplanes filled I could use those as points to calculate base power usage (checked against the system running with only the backplanes and no drives except boot-drives) and what four filled backplanes would consume. It's also how I got to the "10W/drive"-figure and confirmed with other numbers around the interwebs.

I'm pretty sure, a powered on and idle CrazyNAS consumes around 170 W idle (best case), while a fully loaded CrazyNAS consumes more than 700 W. Therefore I consider +700 an absolutely lowest "worst case". I do get, that the previous number "10 W/drive" would equate to: 96 x 10W = 960 Watts for the drives alone. But I couldn't get a precise number pr. drive, since they are different models (different number of chips and model of controller) and I'm pretty sure they're not all fully loaded except for power on and initializing. I have to buy some molex cables, to be able to test the latter. I can get to 96 drives, but the 850 W switches of after 10-15 second, even at idle.

I could try with dual power supplys, but I'm not sure what would happen if OCP trips on one of them. Though I am adventurous, I'm not prepared to sacrifice four backplanes, +90 SSDs, HBA, motherboard and 128 GB RAM - sorry.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Someone whip up a "Jank of the year" badge, because you've earned it. For better and for worse.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Never mind that, how long did it take to extract all those SATA SSDs from their 2.5" enclosures?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Assuming 4 screws per enclosure, that's 360+ screws.
Assuming 5 seconds per screw, that's 1800+ seconds or over half and hour... and a very sore wrist.
 
Top