Is SCALE the right fit for me?

MeatTreats · Feb 22, 2023

Copied from my Reddit post, corrected some typos and added a little bit here and there.

So I have 3 file servers each with different and legacy versions of FreeNAS installed on them. I have empty servers that have never been used and used servers and JBODs are still plentiful and cheap on ebay. My setup is.... not great to say the least and I need to do something about it. I like ZFS but it's restrictions and disk basted RAID make it harder to grow and less resilient than a node based setup.

So I currently have a filling cabinet full of external hard drives that haven't been powered on in years. I have over a dozen 5TB and 6TB internal drives that were filled and removed from my PC and put back into their antistatic bags and put back into their retail boxes and have been sitting, again without being powered up for years. I have I 24 bay server that was only half populated with 4TB drives that was filled and has been sitting powered down for years. I have a 36 bay Supermicro that is completely full of 5TB and 6TB hard drives. This server is completely full and has been powered down for years. My last server is a 48 drive top loader with 6TB and 10TB drives and it is horribly configured. I don't know why but when I set it up I made each vdev 16 drives wide which is bad on performance. While this server isn't anywhere near full, I haven't been using it because I had 2 drives drop out of a single vdev and losing one more would put me in a really bad spot (an ongoing issue that has yet to be resolved)(hence my stupidity in making each vdev so big). I think the server might have a bad backplane as the drives that dropped out of the array tested good. I've just been too to deal with it. I also have 9 drives in my PC of mostly 6TB and 10TB that are full and the only reason for that is my servers are either full or unavailable and I have nowhere to put the data. Lastly, I have a big plastic bin full of cheap writable CD's and DVD's that are at least 15 years old. I know that the data on those is very likely degraded and unrecoverable but hey, you never know.

So that is my setup and as you can see, I need to get my hoard properly sorted out.

I only have experience with FreeNAS, Ubuntu and OpenMediaVault as far as Linux is concerned and only with GUIs.

My research into Ceph points to Proxmox but that talks about VMs and containers and all of that is beyond me. I just want a TrusNAS like OS with an easy to setup Ceph cluster managed by a GUI that obviously has SMB support so I can access everything on my Windows PC. From what I read, Ceph is network intensive but all my servers have Mellanox Connect X-2 cards with 2 10 gig NICs and I also have a 48 port 40GBE switch so setting up the network side of this cluster shouldn't be a problem.

So I guess people are telling me that Ceph isn't ideal for file servers but Gluster is and that is what TrueNAS SCALE uses on top of ZFS. But it also seems from what people are saying is that SCALE is unstable and "not ready" for serious use so... idk.

How would I even do this with my current servers?

morganL · Feb 22, 2023

I'd like you to be part of the community, but I don't recommend running SCALE (or any other software I know of) on this type of set-up, unless its purely for development and torture test.

better to invest in 1 stable second-hand system and drives.

MeatTreats · Feb 22, 2023

morganL said:
better to invest in 1 stable second-hand system and drives.

Well, bummer.

So I have other servers that I am not using but what I don't have is an abundance of is drives and even buying manufacturer recertified drives from a reputable source is still looking mighty expensive.

That is why I want to setup a cluster rather than have my data stored on independent servers. A cluster offers much great redundancy and resiliency and I like the idea of having my data automatically balanced out across all servers. I like the idea of being able to expand the cluster by several drives at a time and when all servers are full, I can just start pulling the smallest drives and have Gluster automatically replicate and rebalance the data.

It sounds like this is the ultimate goal of TrueNAS SCALE, is it not? So when will it be "ready"?

Samuel Tai · Feb 22, 2023

@MeatTreats, GlusterFS, like fusion power, is always 10 years away. Don't hold your breath.

MeatTreats · Feb 22, 2023

Samuel Tai said:
GlusterFS, like fusion power, is always 10 years away.

Funny, I like that.

Basically "Soon

"

Anyways, so you want people to use, test and provide feedback on SCALE but only so long as their data isn't important to them or they are able to afford a complete total backup incase a bug causes their entire cluster to disappear?

morganL · Feb 22, 2023

MeatTreats said:
Anyways, so you want people to use, test and provide feedback on SCALE but only so long as their data isn't important to them or they are able to afford a complete total backup incase a bug causes their entire cluster to disappear?

Not quite.

We're Ok to support customers and users with gear that is solid and consistent.. with well defined requirements. That is SCALE's primary goal.

However, we can't claim to be able to support random and unreliable hardware and provide any level of satisfaction. Our own test labs don't use that type of setup and we know that diagnosing problems can be difficult even with good equipment. Hardware matters. That is not currently a goal.

We'd prefer to warn you now rather than have you try, succeed for a while and then fail.

For just re-using old drives, standard ZFS has decades of field experience and has many tools for handling difficult issues. Replicate or sync to a 2nd system.

Its always safer to build systems that are similar to what other people have already done.

Arwen · Feb 22, 2023

MeatTreats said:
...

Anyways, so you want people to use, test and provide feedback on SCALE but only so long as their data isn't important to them or they are able to afford a complete total backup incase a bug causes their entire cluster to disappear?

I did not read their responses the way you did.

SCALE is under development, and it is always a good idea to have a backup. Further, SCALE as a normal NAS is basically data loss resistant, (which does not mean you can't loose data...). Many people who experience un-stability with SCALE are adding things, (even un-supported things), or using Apps or VMs. While those should not make a server unstable, "SCALE is under development" is more or less the answer.

However, you asked about many things, like using old hard drives. Those old hard drives might end up working for years. But I have have IBM & Seagate drives fail to spin up because the lubricant dried up and in essence, became glue. Of course that was decades ago...

Their was also the request for clustering which is a feature that SCALE does not have, yet. And it's exact feature set is not yet defined, (nor when it will be released).

Things like this:

My last server is a 48 drive top loader with 6TB and 10TB drives and it is horribly configured.

can be problematic without proper cooling. I have even seen disks pop out of their slots due to heat, (but on a totally different type of server.) Basically the back plane warped slightly, not enough to damage it, but enough that eventually the disk lost contact with the back plane.

Ideally, you would re-design your setup by investigating these;

How much data do I need?
How much data do I desire, but not need?
How much growth in the next 3 to 5 years am I expecting?
Do I need the non-NAS features of SCALE, besides the not yet available clustering?

As for the people using SCALE, unfortunately a great many hopped on simply because they thought Linux=perfect.

MeatTreats · Feb 23, 2023

morganL said:
Not quite....

Okay, I replied to you in private.

Arwen said:
SCALE is under development, and it is always a good idea to have a backup. Further, SCALE as a normal NAS is basically data loss resistant, (which does not mean you can't loose data...). Many people who experience un-stability with SCALE are adding things, (even un-supported things), or using Apps or VMs. While those should not make a server unstable, "SCALE is under development" is more or less the answer.

With the amount of data I have, I cannot do a 1:1 backup. Although drives are getting bigger all the time (50TB drives due out in 2025) and the price per TB is falling that may be possible in the years ahead but that is why I want to use SCALE or something like to build a cluster to add extra resilience to my data to harden it against loss in the most cost effective way possible.

I also don't do VMs or docker or Plex or anything like that. This is just going to be pure file storage with SMB, nothing more.

morganL · Feb 23, 2023

Arwen said:
Their was also the request for clustering which is a feature that SCALE does not have, yet. And it's exact feature set is not yet defined, (nor when it will be released).

SMB clustiering is available within trueNAS SCALE. The WebUI is doen through TrueCommand.

Its useful for both high performnce NVMe storage and high capacity HDD based storage.

I can recommend it for well organized similar nodes... but not for this use-case with more random nodes and ad-hoc growth.

ChrisRJ · Feb 23, 2023

MeatTreats said:
With the amount of data I have, I cannot do a 1:1 backup. Although drives are getting bigger all the time (50TB drives due out in 2025) and the price per TB is falling that may be possible in the years ahead but that is why I want to use SCALE or something like to build a cluster to add extra resilience to my data to harden it against loss in the most cost effective way possible.

There is a bunch of wrong assumptions here.

A cluster (whatever the details) can never replace a backup. HA is not a backup.
A cluster setup will not protect your data against deletion by human error, which is still the highest risk by far.
Setting up a cluster is not a trivial exercise. It also typically requires well-defined hardware to be sure that the timings are correct between nodes. In addition testing the setup is time-consuming and not trivial.
Overall it is relatively easy, by putting your data into a clustered solution, to actually lower resiliency. Complexity and effort increase exponentially and the tolerance for errors is very slim. This is why companies almost always have a pre-production environment that is also clustered (in contrast to all other test environments, like SIT, UAT, etc.). Clustering changes the nature of many things, often not known upfront because of the myriad of factors that can affect it.
Side note: Clustering is a highly advanced topic and the implementation details matter. People often think that a cluster either improves performance or at least keeps it, relative to a non-clustered setup. If we are talking about synchronous replication that is wrong and performance will suffer. The other approach is eventual consistency, but that has its drawbacks as well (see CAP theorem for details).
Complexity is one of the biggest risks for system outage and data loss. The simpler a solution, the better: KISS rules.

Without knowing your use-case (or did I miss that?), I think a cluster is probably the worst possible thing you can do. Instead I would have several systems (ideally not in the same location) and replicate data between them with ZFS snapshots. The interval can be really short (like 10 minutes).

That protects against failure of a single system or large parts of it. You still need backup, though.

awasb · Feb 23, 2023

@MeatTreats Maybe I got things wrong. But as far as I read/understand, you did not write much about the workload and the (technical) goals.

What services are you running or do you need?
For how many clients? (And if so: Do you need to scale things up / out?)

One usually defines the goals, then designs/chooses the appliance/software and then buys the (supported) hardware, that fits the needs.

So for a start: Are we talking file services for a single client or a hyper converged cluster for a thousand customers with a bunch of server tasks?

MeatTreats · Feb 23, 2023

morganL said:
but not for this use-case with more random nodes and ad-hoc growth.

Why not?

ChrisRJ said:
Without knowing your use-case (or did I miss that?), I think a cluster is probably the worst possible thing you can do.

I'll answer that below.

awasb said:
What services are you running or do you need?

For how many clients? (And if so: Do you need to scale things up / out?)

So for a start: Are we talking file services for a single client or a hyper converged cluster for a thousand customers with a bunch of server tasks?

Services: just SMB. Number of clients: 3-5 PCs. Yes, I will be scaling up/out over time. This is just a simple file storage cluster, no VMs, no Plex or containers or other apps.

morganL · Feb 23, 2023

MeatTreats said:
Why not?

@ChrisRJ spelled it out well.

I will just add that with random nodes and random growth, there will be a lot of administrative actions. These actions provide an opportunity for human errors or software errors. Because no-one has a system like yours, both types of error will be more likely.

Given that the data is not backed up, the errors can be catastrophic and cause massive data loss.

The TrueNAS brand is built around minimizing data loss. We can't recommend you take this journey. ( I didn't have to say anything, but I wanted to make sure others in a similar situation might also see this warning).

If you needed a cluster and had good equipment, we'd be delighted to support your endeavour.

MeatTreats · Feb 23, 2023

morganL said:
If you needed a cluster and had good equipment, we'd be delighted to support your endeavour.

I do need a cluster and I do have "good" equipment. All of my servers may be older and second hand but they are all enterprise and all pulled from data centers. Most of them are Supermico's and they all have Supermicro motherboards with Intel CPUs and LSI HBAs. It is all trusted reliable stuff not some server chassis put together with random salvaged and consumer parts.

The only reason why I want to be able to grow my cluster is because I have unused Supermicro servers and a lot of loose but full internal drives and a lot of full external drives. These are all good drives that can be ingested into my cluster and continue being used once the data has been copied over.

Having all my data on ONE or TWO machines increases the risk of data loss, but spreading it out across 5-10 not only gives me room to expand without having to replace drives but used servers and JBODS are still plentiful and fairly cheap on ebay. And I also assume that SCALE has its own implementation of Reed-Solomon erasure coding or something like it to replicate data over the entire cluster so if one node fails, data isn't lost? Witch is another reason to simply keep expanding the cluster with used servers from ebay and cheaper recertified or white label drives rather than replacing drives and resilvering all the time.

Is the problem that you just want ALL my servers to be the exact same type, exact same model from the exact same manufacturer with all the exact same hardware, specs and configuration? Maybe that's ideal and possible for big multi billion dollar corporations but for home users like me and small businesses, that would be unreasonable and if TrueNAS SCALE can't make a cluster work across different servers then what even is the point. I mean, and I really don't mean to sound like an ass here, every server is nothing more than a computer with a bunch of drives in it. How hard can it be to just make it work?

Samuel Tai · Feb 23, 2023

MeatTreats said:
Is the problem that you just want ALL my servers to be the exact same type, exact same model from the exact same manufacturer with all the exact same hardware, specs and configuration? Maybe that's ideal and possible for big multi billion dollar corporations but for home users like me and small businesses, that would be unreasonable and if TrueNAS SCALE can't make a cluster work across different servers then what even is the point. I mean, and I really don't mean to sound like an ass here, every server is nothing more than a computer with a bunch of drives in it. How hard can it be to just make it work?

And this is why GlusterFS is always 10 years away. We don't require all servers to be the exact same type and model, but they should have nearly identical storage systems and topologies. Otherwise, clustering incongruent storage configurations leads to all sorts of irreconcilable problems trying to place blocks (especially redundant or checksum blocks) on systems that won't have a matching topology.

As for home user expectations, you seem to mistake the purpose of SCALE. This is for enterprise storage scale-out, where having banks of identical servers with identical storage systems is not an unreasonable expectation.

WI_Hedgehog · Feb 23, 2023

@MeatTreats : I think what they're getting at is the more variables a problem contains the harder it is to find solutions.

I think your starting point should be: Fire up an empty server in a Test environment with TrueNAS SCALE and get your feet wet. You're going to need drives, and to burn-in the system. Then repeat the process on a second system. Once you're really solid in your knowledge of TrueNAS set up a test cluster similar to what you expect you'll need. That should give you the experience in failing (I mean that kindly) that you need so you can build a roadmap of how to succeed.

At that point you should have a solid plan in place and maybe the hardware to start you out, you can rebuild the servers in the Test Environment, and when they prove out move them to Production, fix stuff, then start migrating data from your existing servers to the new production servers. Once that is done you can take down your fully-migrated "old" production servers and build them back up in the Test Environment and burn them in, and perhaps later migrate them to the Production Environment if possible. You might also find you're better off without a cluster, or at least not a cluster as you're currently imagining it.

The advice you've been given by others is meant to be helpful, this in theory is easy, though reality is "somewhat different." If anything goes wrong this can go badly without anyone knowing it's going badly until things are irretrievably broken, and by "things" I mean "things that used to work and should still be working don't, and I somehow I can't even tell which things those are."

So, yes, your current system can be greatly improved and should be. By replacing it. With a well-designed system. Once you have the education and ability to do so competently. Which is the same process most (or all) of us would follow, educating ourselves to your current system, finding ways to carefully keep it running, learning your current and future needs, developing a plan on how to meet those needs, figuring out how to migrate your current system to a replacement system without losing data or abilities, and building a test system to see if the plan will work. That plan would then be massaged so it will work and the systems rebuilt, tested, and maybe rebuilt several more times in the Test Environment until everything works as expected, and the whole process is repeatable. THEN we'd migrate it to the Production Environment with a fair assurance random things are going to not work as expected and some "stuff" might (let's be honest: will) break.

I'm telling you this because that's what I walked into. The disaster didn't happen overnight, and it wasn't going to be resolved overnight either. So over a period of years I fixed things, slowly. Then when things were stable and running reliably and well I landed here, because I could do better. The members here were really quick about keeping me out of trouble, and that's after I'd read the forum (quietly) for six months and knew I didn't fully know what I was getting into. I'm still testing my first system and it's going great (that mainboard fire nonwithstanding). The thing is I listen, then research everything they tell me and figure out why they told me, because they're right, I just don't always know why they're right. Cumulatively, there is an insane amount of knowledge and experience here, I just need to sponge it up.

morganL · Feb 23, 2023

MeatTreats said:
With the amount of data I have, I cannot do a 1:1 backup. Although drives are getting bigger all the time (50TB drives due out in 2025) and the price per TB is falling that may be possible in the years ahead but that is why I want to use SCALE or something like to build a cluster to add extra resilience to my data to harden it against loss in the most cost effective way possible.

I wanted to clear up one misconception about single server TrueNAS. When a specific server fails, there is very rarely any loss of data when ZFS is being used. There is only a temporary loss of access to the data until the server is fixed or replaced.

Drives can be moved to a replacement server and the data re-imported. JBODs can be moved to the second server.

For enterprise users, we provide a dual controller system to automatically re-import the data to a second controller/server. However, that has Enterprise-grade and cost hardware. The process is fast and hence we call it High Availability.

For a home lab user, a second standby server can be used. The process is slower and more manual, but there is no loss of data.

That is the other reason for my recommendation.... i think it is the safer way to store a modest amount of data. Clusters are good at very high capacity (e.g >10PB) and very high bandwidth (e.g >10GB/s).

If the data is valuable, then offsite backup is required..... neither ZFS or clustering solves for fires and earthquakes.

awasb · Feb 23, 2023

MeatTreats said:
[…]
Services: just SMB. Number of clients: 3-5 PCs. Yes, I will be scaling up/out over time. This is just a simple file storage cluster, no VMs, no Plex or containers or other apps.

Why exactly do you need a cluster? What for? What number of clients / storage size / applications / transfer speed do you need to scale up/out to? For 3-5 clients via smb a single microserver with 1GbE Nic could (!) be „perfectly enough“. (Even by far.) What data are we talking about? How important is the data? Any credit agreements, business-critical company documents or work data for the loss of which you would be personally liable with your assets? How critical is downtime? How many minutes per year could you afford?

It seems (!) to me (!) the „need“ is more or less due to technical playfulness … Not a bad thing. But as far as you described things, you neither _need_ Proxmox/ceph nor Scale. To me (!) it seems (!), the best thing to start with (testing) would be ESXi with OmniOS (and maybe napp-it „on top“). Not for getting the job done, but for learning.

WI_Hedgehog · Feb 27, 2023

Now that iXsystems mass-mailed a status update with YouTube links in it and even more YouTube viewers are heading here for help I think it would help iXsystems greatly to set up a YouTuber Help/Getting started page, because in six months to a year these built-on-gaming-hardware systems are going to crash badly (no UPS/no power filtering/non-ECC memory/Realtek NIC/on-board RAID controller/running in a VM/etc.) and iXsystems is once again going to get YouTubers attention, but like with LTT it's going to be "We lost all our data!" and iX is needlessly going to get a bad reputation even though they have an excellent set of products.

The YouTubers have one thing in common: They don't use Search. Every day an increasing number of threads pop up with the same preconceived notions that there's a magic Easy Button to press so their gaming setup works better than everyone else's system and they're going to have an FPS advantage. And they don't listen. @morganL, iX needs to give them an "Easy Button" with a guide on how to set up a "power-efficient*" basic system to get their feet wet, then some sort of next step to easy virtualization--and even that's a stretch, they want the one-shot TrueNAS-under-Proxmox appliance--and maybe that's what you need to give them.

---
*reoccurring YouTube viewer wish: able to run 4-7 Virtual Machines fast but with minimal power usage so they can leave it on all day to stream music and movies and run a Minecraft server.

morganL · Feb 27, 2023

WI_Hedgehog said:
Now that iXsystems mass-mailed a status update with YouTube links in it and even more YouTube viewers are heading here for help I think it would help iXsystems greatly to set up a YouTuber Help/Getting started page, because in six months to a year these build-on-gaming-hardware systems are going to crash badly (no UPS/power filtering/ECC memory/Realtek NIC/on-board RAID controller/running in a VM/etc.) and iXsystems is once again going to get YouTubers attention, but like with LTT it's going to be "We lost all our data!" and iX is needlessly going to get a bad reputation even though they have an excellent set of products.

The YouTubers have one thing in common: They don't use Search. Every day an increasing number of threads pop up with the same preconceived notions that there's a magic Easy Button to press so their gaming setup works better than everyone else's system and they're going to have an FPS advantage. And they don't listen. @morganL, iX needs to give them an Easy Button with a guide on how to set up a "power-efficient" basic system to get their feet wet, then some sort of next step to easy virtualization--and even that's a stretch, they want the one-shot TrueNAS-under-Proxmox appliance--and maybe that's what you need to give them.

Glass looks half-full to me...

Important Announcement for the TrueNAS Community.

Is SCALE the right fit for me?

Dabbler

Captain Morgan

Dabbler

Never underestimate your own stupidity

Dabbler

Captain Morgan

MVP

Dabbler

Captain Morgan

Wizard

Patron

Dabbler

Captain Morgan

Dabbler

Never underestimate your own stupidity

Guru

Captain Morgan

Patron

Guru

Captain Morgan

Similar threads