Scalable hardware setup

Status
Not open for further replies.

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I guess I don't "need" all the space in a single pool, but that's the scenario ZFS is designed for, and it's certainly convenient. Otherwise, I have to create a new pool, then decide what I'm going to move there and what I'm going to keep on the original pool. Shares will probably need to be moved. Other things may need to be reconfigured. Making separate pools does mitigate the risk of a vdev failing and taking the pool with it; I'm figuring that a properly designed, properly maintained six-disk RAIDZ2 vdev isn't very likely to fail. But really, multi-vdev pools are pretty common. Four disks in striped mirrors? That's two vdevs--two, two-disk mirrors, striped together. That's pretty much what @NetSoerfer is suggesting. Advantage is that you can add a pair of drives at a time, rather than groups of four or six. Disadvantage is that the redundancy isn't quite as good (if the wrong two disks go at the same time, you're toast), and you lose half your space to redundancy.
 

ALFA

Explorer
Joined
Aug 23, 2014
Messages
53
Your chance of surviving in a mirror vdevs is 1-(f/(n-f)), where f is the number of disks already failed, and n is the number of disks in the full pool. Three disk mirrors are even more resilient.
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
you lose half your space to redundancy.

I still don't understand why people get so uptight about "losing space". Space is cheap. I remember paying $500 for a 40MB HDD. Set up a 3 way mirror, lose 2/3rds your space, lose another 20% to maintaining less than 80% occupancy, and move on with life. :smile:
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I don't know that it's a matter of being uptight, but it's definitely a factor worth considering. Yes, I remember the bad old days as well (I remember a 5 MB HDD selling for $1k, but never bought one of those--the worst I did was $459 for an 85 MB model), and certainly storage is cheaper than it's ever been, but cheap isn't the same as free. But really, more than the cost, it's going to affect how you design the system.
 

Jokener

Dabbler
Joined
May 1, 2016
Messages
17
If we were talking about a tool/appliance for work, I would agree with you, jgreco.
In that case it is a requirement to do the job and can be seen as an investment in your livelihood.
And in the grand scheme of things the cost is not that great compared to other tools etc one may need for work.

In my case, I just don't want to loose my data.
That precious stuff that I really, really can't loose is only a fraction of my total data footprint and is well protected.
But I am planning to use my FreeNAS server to hold a lot of stuff that I want (!) to store, but that I don't need to live.
The videos of amateur boxing fights at all tournaments me and my girlfriend participate, I like to keep those videos in case the same opponent comes up again.
That gives me an edge in the competition, just as proper training and comfortable equipment does. I enjoy competing and winning, but I don't need it to live.
When I store all my iTunes TV shows and movies locally, it makes life a bit easier. But again, I don't need it to live.
I would never complain about not having enough money because I can afford everything thats really important in life and feel very fortunate about that.
But I can't just drop a few hundred Euros on every 1st world problem I encounter. There are simply too many things one can desire.
So yes, space is cheap. But its not free. And it is often needed - or wanted - in fairly large quantities.
And then it becomes a trade-off between this one thing you want and all the other things in life that you might want and some of them that you really need.
In that regard I wouldnt see it as being uptight about "loosing" space. The space can be "invested" in the data redundancy or in actual storage.
We just don't want to make shit investments.
Yes, I am willing to invest some 1000€ in my FreeNAS and the things I want it to do for me, but I am not willing to waste even 100€ if I can avoid it.
I will however congratulate you and all two other people that don't have to pay any attention to their finances. On all that success, I congratulate you :smile:
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Yes, I am willing to invest some 1000€ in my FreeNAS and the things I want it to do for me, but I am not willing to waste even 100€ if I can avoid it.
I will however congratulate you and all two other people that don't have to pay any attention to their finances. On all that success, I congratulate you :)

Excuse me? Have you ever priced out a commercial NAS solution? Something like an EqualLogic PS6110XS with a bunch of 1.2TB 10K SAS drives and 800GB SSD's can easily go for $60-$80 THOUSAND. I can build something that kicks the crap out of that for about a tenth of that with FreeNAS. I think many of us here are here because we're tired of being reamed out for expensive NAS solutions when most of these things are actually just hardware plus software platforms. Getting rid of the fancy hardware and going with standard PC server hardware fixes half that.

If you want to protect your data, there's no magic fix. Somehow you have to pay for protection. The raw disk is very cheap. If you want fast, you go with mirrors. If you want to compromise away speed, RAIDZ2 or Z3 gives you better efficiency at lower speed. It's not really that complicated.
 

Jokener

Dabbler
Joined
May 1, 2016
Messages
17
@jgreco:
Yes, I know professional systems can be far more expensive.
I even explicitly mentioned professional use cases in my post.
Your comparison is irrelevant, because the use cases are different.
A commercial server with nearline-SAS drives and SSD's (hopefully) delivers a completely different kind of performance than my sub-1500€ build.
If you can free up work time of your employees it may very well be worth it.
I know how a friend of mine regularly wants to kick his IT department, because a few hundred Euros were saved in purchasing his equipment, resulting in about 1 hour of wasted time per week.
If one wanted to make the ideal business decision, a decision would have to factor in the properties of the products as well as the costs and benefits it provides.
(Duh...)

But many of us are not looking to make the ideal business decision.
The wires network in my home is gigabit ethernet, so I have no use for a server that could saturate a 10 gigabit connection.
I would even be willing to scale down to a 500 megabit connection speed, if it would come with a 50% price cut.
And since we have to pay for this with our own money, we don't want to waste any of it.
Nobody is looking for unicorns and fairy dust.

So out of the few options that exist, we are looking to make the ideal choice.
And when a topic is fairly new for you, then you ask others for help to avoid mistakes.
Just like I overlooked the fact that my initial motherboard choice did not include IPMI (even though a price comparison website said it did).
I don't quite understand your somewhat hostile attitude here...
 

ALFA

Explorer
Joined
Aug 23, 2014
Messages
53
Does the formula need a term representing drives-per-vdev to account for this?

In an eight disk pool, you have 100% survival on the first disk failure, 85.7% survival on the second disk failure, 66.7% survival on a third disk failure, and you can keep going on and on.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
I don't understand how survival probability can be independent of vdev configuration. With 2-way mirrors, one failure leaves the pool with no redundancy in that vdev, so there's a non-zero probability of the next failure destroying the pool. With 3-way mirrors there is still redundancy after one drive fails and the probability that the next drive failure will destroy the pool is zero.
 

Jokener

Dabbler
Joined
May 1, 2016
Messages
17
This formula does not model pool configurations with different protection levels in different vdevs, mixed RaidZ types and/or mirror types and so on.
The formula spplies only to pools made up of mirror vdevs. (And only two-way mirrors, not three-way mirros.)
In that case, one disk is unprotected as soon as its mirror fails.
Then, the chance of the next failure killing you pool is equal to the fraction of un-mirrored drives in an otherwise mirrored configuration.
Imagine you are running 8 disks in a striped and mirrored configuration.
1-(f/(n-f)) is the formula for data loss probability, with f being failed drives and n being total drives.
Before your first failure, the chance for pool survival is 100% (Because: 1-(0/(8-0)) = 1-(0/8) = 1-0 = 1).
Now, your first drive failed. Your data is fine, because your survival chance was 100%.
1-(f/(n-f)) now needs to be computed with f=1 and n=8 comes out to this: 1-(1/(8-1)) = 1-(1/7) = 1-0,14 = 0,86
So the chance of surviving a second failure is 76%, because you have 7 live disks and six of them are still protectet.
You will only loose data in the 14% of cases, where the second failure hits the unprotected drive.

The problem with this formula is not mathematical. It is that the real world problem does not follow this formula.
Because restoring a mirror by replacing the drive may place additional stress on the unprotected drive, not the other working drives.
Therefore the mere restoration of the failed mirror influences the probabilities of failure.
But as a general guide and tool to inform your decisions, this formula can be very useful.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The problem with this formula is not mathematical. It is that the real world problem does not follow this formula.
Because restoring a mirror by replacing the drive may place additional stress on the unprotected drive, not the other working drives.

This is one of the traditional issues with simple mirroring, yes. It's possible to be a lot more intelligent about how you mirror data, but that always comes at a cost. In the USENET business, some of you may be aware that we do server level redundancy and do no RAID or mirroring at all on the servers. So there's two sets of servers each with all of the articles (possibly on different continents). Articles are distributed to a particular server based on a hashmod function of the Message-ID, and then on that server by a different hashmod function. Given a Message-ID, you can always point to the disk that holds it. So on the first set of servers, the hashmod functions are computed differently than on the second set of servers. When a disk fails, redundancy is lost, but also can be very quickly rebuilt, because all the messages that were on Server Set 1, Server 22, Disk 41 when that disk fails, are actually distributed evenly around all the servers in Server Set 2. Those machines quickly capture their little share of the lost articles and forward it back to the machine in Set 1 that lost its disk. Instant rebuild, and very graceful load spreading. Downside, somewhat difficult to expand without more complexity.
 

Jokener

Dabbler
Joined
May 1, 2016
Messages
17
The Protection against single disk failure is indeed very graceful in your Usenet scenario.

I guess we are drifting off-topic from the initial question.
But since that has been answered to (much more) than my complete satisfaction...
Is there another online backup service, where cold storage is maybe cheap and unlimited/plentiful, but in a recovery-situation one can pay extra to accelerate the "download"?
Because true risk mitigation only works with two storage locations, which might be difficult for some.
And using a friends place to put the server is also not always a viable option.
My father has a company and could put a second server of mine in his server room, but with a 6k/6k up/down internet connection on his side (nothing faster available) I'd be completely hammering his bandwidth with every large transfer.

My ideal scenario would be something like Backblaze or Crashplan with their 10€ per month.
In a recovery situation I would be willing to spend a couple hundred Euros extra, to ship the data on disks.
But as I don't hope to need fast link speeds, I'd rather not spend too much money on it.
Last I checked even the cold storage tiers with Amazon S3 and Microsoft Azure were priced in the mid double-digits per TB.
That's not unreasonable for what they are offering, but what they are offering FAR exceeds most private needs...
 

ALFA

Explorer
Joined
Aug 23, 2014
Messages
53
The problem with this formula is not mathematical. It is that the real world problem does not follow this formula.
Because restoring a mirror by replacing the drive may place additional stress on the unprotected drive, not the other working drives.
Therefore the mere restoration of the failed mirror influences the probabilities of failure.
But as a general guide and tool to inform your decisions, this formula can be very useful.

When a disk fails in a mirror vdev, your pool is minimally impacted, (nothing needs to be rebuilt from parity, unlike RAIDZ And it gets worse with a wider number of disks in the vdev), you just have one less device to distribute reads from. When you replace and resilver a disk in a mirror vdev, your pool is again minimally impacted, you’re just doing simple reads and simple writes to the new member of the vdev.

In the real world, you’re usually going to look at rebuilds taking anywhere from 1.5x to 3x as long on a single RAIDZ vdev as they would take on the same disks in a pool of mirrors.

Edit. sorry for the off-topic, its just an interesting option to take into consideration.
 
Last edited:

NetSoerfer

Explorer
Joined
May 8, 2016
Messages
57
Is there another online backup service, where cold storage is maybe cheap and unlimited/plentiful, but in a recovery-situation one can pay extra to accelerate the "download"?
[...]
In a recovery situation I would be willing to spend a couple hundred Euros extra, to ship the data on disks.
[...]
Last I checked even the cold storage tiers with Amazon S3 and Microsoft Azure were priced in the mid double-digits per TB.
@Jokener, have you had a look at Amazon Glacier? It's cheaper than S3 ($7/TB vs $30/TB for US servers), it takes a couple of hours to prepare your data for recovery, and you pay based on how much you download how quickly - if you just need a few things urgently and can slowly recover the rest of your data over a couple of days/weeks, it'll be cheaper than downloading everything as quickly as you can.

You cannot get Amazon to ship disks though.
 

Jokener

Dabbler
Joined
May 1, 2016
Messages
17
@NetSoerfer: Yes, I saw it on a news site a few weeks or months ago.
Looking at my data, I'd be looking at 50$ per month, which is more than I can or rather want to spend.
I am sure its a good service, but not really geared at private users. Thanks for the hint, though.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
@Jokener, have you had a look at Amazon Glacier? It's cheaper than S3 ($7/TB vs $30/TB for US servers), it takes a couple of hours to prepare your data for recovery, and you pay based on how much you download how quickly - if you just need a few things urgently and can slowly recover the rest of your data over a couple of days/weeks, it'll be cheaper than downloading everything as quickly as you can.

One thing you might want to do is to make a deal with a friend "nearby" to stick a small little cheap NAS unit on his network. There are numerous NAS units which are in the ~$100-$300 range, where you could just throw some SATA disk in there and get several terabytes of offsite backup capacity, which you can temporarily retrieve if you need to do a recovery at high speed.
 

NetSoerfer

Explorer
Joined
May 8, 2016
Messages
57
Looking at my data, I'd be looking at 50$ per month, which is more than I can or rather want to spend.
Yes, it gets expensive if you backup multiple terabytes. I've been using it to backup only critical data, not the data I don't need but don't want to delete.

One thing you might want to do is to make a deal with a friend "nearby" to stick a small little cheap NAS unit on his network. There are numerous NAS units which are in the ~$100-$300 range, where you could just throw some SATA disk in there and get several terabytes of offsite backup capacity, which you can temporarily retrieve if you need to do a recovery at high speed.
I've been thinking about this idea a few times myself, actually have put a small NAS at my parents' house which I backup my critical data to (and vice versa).

The thing I'm uneasy about though is data consistency on that box - we go to great lengths to ensure we have consistent data, but that box, being a consumer NAS like my current one, runs on EXT4. How can I be sure I don't recover inconsistent data when that box is my last resort? (
 
Last edited:

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
How can I be sure I don't recover inconsistent data when that box is my last resort? (
...get FreeNAS compatible hardware as in a cheap ECC compatible cpu, some SM board, 16g ram, and repurpose the drives.

I don't trust EXT4 a bit more than I trust NTFS. That level of trust got me here.
 
Last edited:
Status
Not open for further replies.
Top