The answer is no. What was the question???

sgbotsford · Jan 22, 2012

From reading other posts, the functionality I want isn't available yet. So this is a request.

Background. Right now I have a Drobo S that fails under fireware. The combination of Drobo's slothe and USB make for a system that is tolerable (barely) for time machine on a single apple workstation.

One of the features I like about Drobo is the ability to use any combination of disks, and the ability to add additional disks subject to the number of bays. As far as I can tell this is not available on ZFS yet.

Is this correct?

Is there an robust open source solution that does support this feature?

I'm not a file system guru. As a complete idiot if I were trying to implement such a system I would divide each disk in chunks. The size is arbitrary. In the event of a single disk, each chunk is written twice on the disk. If the system is clever and can work through the obfuscation of the drive hardware it will try to locate these copies so that an event that takes out any given cylinder, head, or sector wont take out both copies. Add a disk. One copy of each chunk is moved to the second disk, and the space is freed up on the original disk.

Add a disk. 1/3 of the copies of each disk move to the new disk.

That's how it works with equal disks.

Now consider unequal disks

New disk smaller than old: If there is room, half the data moves over. If not, then the smaller disk is filled to a level compatible with reasonable performance. Remaining data is written twice to larger disk. Operator is notified that not all data is protected. If really clever the FS tries to optimize by putting frequently accessed chunks on both disks.

At 3 unequal disks, it gets more interesting. If the sum of the smaller 2 is smaller than the large disk, then each chunk is written to the large disk and ONE of the smaller disks. Extra space on the larger disk is used in 'write twice' mode. If the sum is larger than the large disk, then all of the large disk can be used, with the smaller disks used in such a way taht the remaining space is equal on both. This space can then be used writing to the two.

At 4 disks my mind starts to boggle at doing it off the top of my head.
So at any given time there are two blobs of storage: One in which data is mirrored on two disks, one in which data is written twice on one disk. The latter is used as 'overflow' as it is both more vulnerable and slow. In the case of equal sized disks, or a combination of disks that can be partitioned into two equal sets, the overflow blob is size zero.

And this is only mirror 2.

3 way mirror is also possible. I will assume that anyone concerned enough to mirror 3 way will not tolerate overflow blobs. So with 3 disks the maximum mirror size is the smallest disk. Add a disk. With 4 disks of size A >= B >= C >= D assign chunks as follows:

Chunks are labeled C1, C2, C3... Considering mirrors, C1a C1b C1c etc.
In initial layout on 3 disks C1a goes on the first disk, b,c on the next two. C2a goes on the second disk, C3a on the third disk.. C4a is back on the first disk again.
When we add a disk, chunks get reshuffled. If the new disk is the new smallest disk (D) then it is used in the same manner as above, using the remnant space of the A and B. If the new disk is largest, (A) then all the chunks from the smallest disk are migrated over to it. Additional chunks can now be mirrored up to the capacity of third largest disk, and then D is treated as a new smallest disk.

You could do raid5 this way too.

The chunk size is arbitrary. I suspect that chunks should be large enough that few file operations will span into two chunks. They should be small enough that the leftovers after dividing up the disk aren't big enough to worry about. Small chunks make more housekeeping to keep track of where each chunk's clone is.

In terms of drive interaction I suspect that a chunk should be large enough to include a complete track, and possibly a complete cylinder. As long as the head is there, slurp up the entire track. So at a guess, a track is the lower bound. Some percentage of the disk is the upper bound. Doing this intelligently would likely require that the system do a large number of reads to create an LBA->Cylinder map. Or some cooperation from drive makers. Probably there are optimizations relating to the cache size of the disk too. But remember that chunks are NOT basic units of IO. They are ways of assigning space. At first blush, I'd make chunks around 10 GB.

sgbotsford · Jan 24, 2012

For the sake of argument consider the 3 disk set in mirror mode. We want to write each block twice.
Also we will chunk the disks into 100 GB partitions. Or perhaps the partition table should be skipped and ZFS handles the chunking internally.

Suppose we start off with an array of 3 disks A,B,C of 1.0 TB, 1.6 TB and 2.0 TB. This means 10, 16, and 20 chunks.
3 eq.
a = shared chunks between A,B
b = shared chunks between A,C
c = shared chunks between B,C

a + b = 10
a + c = 20
b + c = 16 => c = 16 - b

a + b = 10
a + 16 -b = 20

Adding the 2 eq.
2a +16 = 30 => 2a = 14 => a = 7 and b = 3, and c = 13

so
7 chunks are have copies on disk A and disk B
3 chunks have copies on disk A and disk C
13 chunks have copies on disk B and disk C

Mirroring raid on 3 disks.

Upgrade the 1.0 TB disk to a 3.0 TB disk.

Now

a+b = 30
a+c = 20
b+c = 16 => c = 16 -b

substituting

a+b=30
a + 16-b = 20

Adding

2a + 16 = 50
2a = 34
a = 17 b=13 c=3
(we increased the first disk by 20 chunks. a goes up by 10.)

Before: After
a 7 17
b 3 13
c 13 3

Two of the splits will go up. Can't help it. This gives some manoevering room to shuffle. Remember that c is the number of chunks on drives B and C. For it to go down, one of each of those copies moves to the new disk A.

Now lets add a 4th disk to the mix. We'll add a second 3 TB drive as disk D. We could do this the same way, with 4 equations. But there are 6 ways to pair 4 objects. So doing it that way is indeterminate.

So we cheat. A certain amount of space is on all the drives. Split taht evenly.
Our drives are now 3.0 1.6 2.0 and 3.0. 1.6 is the smallest. So 16 chunks of A are shared with B and 16 chunks of C are shared with D. This leaves us with 14, 0, 4, 14 Ignore the 0. 4 is the smallest, and the other 2 are equal. Disk C shares 2 chunks with each disk A and D, Disk A shares 12 chunks with disk D.

Could we have used this ame method before?
A B C = 30 20 16

16 is common to all
Disk C shares 8 with A and 8 with B
Now there is 14, and 6 left over. They don't match.
So no, the subtract the common method works with 4 or more disks, and may get messy with odd numbers.

Let's try. A B C D E = 1.0 1.6 2.0 3.0 3.0 or 10, 16, 20, 30 30 chunks.
10 chunks are common to all. Use up the 10 pairing up with 2 2 3 3 of the others.
This leaves 14 18 27 27 left.
Use up the 14 as 4, 5 5
This leaves 14 22 22
Split 14 between the two
This leaves 15 15
Match them together.

YOU CHEATED! The last two disks were the same size.

Fine A B C D E = 1.0 1.2 2.0 3.0 4.0 = 10 12 20 30 40

Split the 10 as 2,2,3,3 Leaves 10 18 27 37
Split the 10 as 2,3,5 Leaves 16, 24, 32
Now we can either do the 3 eq bit, or just look and say, how do I split 16 with a difference of 8. 4 and 12.
So we split 16 as 4, 12 This leaves 20 and 20, which we then match up.

Daisuke · Jan 24, 2012

sgbotsford said:
From reading other posts, the functionality I want isn't available yet. So this is a request.

Background. Right now I have a Drobo S that fails under fireware. The combination of Drobo's slothe and USB make for a system that is tolerable (barely) for time machine on a single apple workstation.

One of the features I like about Drobo is the ability to use any combination of disks, and the ability to add additional disks subject to the number of bays. As far as I can tell this is not available on ZFS yet.

Related to your request, have you ever wondered why nobody would comply with such demand or why nobody else created a solution like the one you ask? That is because nobody in their right mind will ever use such setup. I'm going to simplify your decision:
1) Use Drobo if you don't care about your data; OR
2) Use any proven NAS with a proper array combined with a reliable backup solution if you do care about your data

There is no but, why or how come. Millions of users have some sort of array setup for a reason. They don't have the drive mounting "flexibility" Drobo has because an admin would rather have data secure. Since you are interested in that useless feature that is helpful only to home users who don't care to plan in advance their data safety, I suggest you stick with Drobo. One day you will hit your head against the wall because you lost everything. I really hope you will not end-up with a Drobo hardware failure as the only way you can rescue your data is by buying another Drobo. Ridiculous, ehh? Not to mention the low transfer speeds... also, you do know you play Russian roulette with your data, right? If the firewire connection fails during a data transfer, the file structure needed to access your data is irremediably corrupted. Call Data Robotics to confirm, if you don't believe me.

In conclusion, if you are not interested to safeguard your data, use Drobo or other "cheap" non-raid based solutions. Personally, I paid $750 to build my box, just to give you an idea. If one of my disks fails, I can still rebuild my array not worrying that another disk might fail during the re-silver process. If you are interested to have a reliable storage box that allows you to do proper backups, use a NAS and build it properly while planning in advance your disks capacity, data expansion life stem as well a real backup solution in case of disaster. A NAS is not a backup for your current data.

sgbotsford · Jan 25, 2012

Hi TECK

Just hold on a minute. What are you so up in the air about. I asked for one feature: The ability to use arbitrary sized disks, and the ability to add to it incrementally, either by adding an additional disk, or by replacing a smaller disk with a larger one.

I then outlined a conceptual way that this could be achieved.

I expected replies on this forum to be explanations of why this idea is a bad one, or alternately how it could be implemented.

FreeNAS is a good solution, but it is incomplete. Data needs expand. Right now the only ways to expand storage are to build another box, or to replace ALL the drives (one by one).

I explained that I was not pleased with Drobo -- that I was looking for a replacment system. FreeNAS zfs is one system I'm looking at.

I will be looking at your build.
Why did you choose version 8 instead of version 7?

Daisuke · Jan 25, 2012

sgbotsford said:
Hi TECK

Just hold on a minute. What are you so up in the air about. I asked for one feature: The ability to use arbitrary sized disks, and the ability to add to it incrementally, either by adding an additional disk, or by replacing a smaller disk with a larger one.

That is exactly my point... IMO those are ridiculous features. If you plan in advance your setup, you will not worry about space for a LONG time. Right now I have 7TB available. It will take years before this space will be used. By then, new technology will arise and I will definitely build a new box. Why would you want to juggle constantly with disks, when you can build from start a solid box and forget about it?

sgbotsford · Jan 25, 2012

Mostly because storage is always cheaper next Tuesday. If storage is $80/TB today, in 3 years it will likely be $25/TB.

While the Drobo is current a 4 TB raw capacity (2.66 actual -- it *does* do something for redundency) I like that if I need a temporary increase in storage I can either replace one of the single terabyte drives with a 2 or a 3, or I can add an additional drive in the vacant slot.

Right now in our household we have close to 15 TB of storage in one form or another. I don't know what we will need three years from now. So I want a system that allows flexible growth. Most of those files are archives. My better half has a penchant for collecting old british TV series. But, our current policy is to install drives in pairs -- mirrored -- as individual drives are not sufficiently reliable.

I see two possible markets for FreeNAS storage. One, is much like you have built -- a moderately high end server that can hand out protected files as fast as any commercial server. (You did a nice job with that box BTW) The other is as a lower performance, low power usage box suitable for a network backup solution in a very small network.

The tradeoffs between incremental additions vs buy another whack of disks additions:

* Incremental is cheaper -- you don't buy storage until you need it, getting a better bang for your buck. Buy your box with two disks. Add disks as you need them, always buying whatever disk has the best bang for your buck that week. Once you get to a full cage, start replacing the smallest disks.
* Whack of Disks are too likely to come from the same lot, unless you anticipate your needs far enough in advance to buy them spread out, or visit multiple shops, or deliberately buy multiple brands.
* Whack of disks are more likely to all get caught in a production bug. (Remember that firmware bug that Seagate had a few years back.
* Incremental means you have to be aware of the storage status, and be ready to add/change a disk. Sysadmins have been writing scripts for years that email them when space gets tight.
* My proposal adds a layer of complexity to the setup of pools. I suspect it would be workable using partitions with the present ZFS, but doing it with containers aka vinum would likely make more sense. This may not be the only way to add the feature set I want. Doing it manually would be a PITA. However it should lend itself to scripting.
* The added complexity may have an impact on performance, particular if ZFS doesn't take into account the occurrence of multiple chunks on the same spindle. It also is more complex, hence possibly more buggy.

Brand · Jan 25, 2012

sgbotsford said:
I see two possible markets for FreeNAS storage. One, is much like you have built -- a moderately high end server that can hand out protected files as fast as any commercial server.

You obviously are not familiar with commercial/enterprise servers. They cost much more than the $750 that TECK spent to build his and do not use Atom processors. The server that TECK built is "a lower performance, low power usage box" (your words) but is suitable for much more than just network backups that you said.

sgbotsford · Jan 26, 2012

You're correct. Enterprise servers are a bit more. I worked for Yotta Yotta (Now part of EMC) Racks full of disks. High availability fiber-optic fabric. About a million bucks a rack.

While not enterprise class, 120 watts & 8 GB ram is NOT a lower performance low power box. The throughputs he mentions make it a strong contender for a small department main file server.

For one thing he installed version 8. From the info on these forums this is overkill for most home users. I understand that 7 will run on a lot less hardware, and give suitable performance.

There is a need for a home NAS that has the reliability of TECH's box, and 1/4 the price, power use and performance.

There is also a market for TECH's box in the home. In that setup you have NO local disks (other than boot, swap, and the like.) This is acting more as your Departmental server.

And I still say there is a demand for incremental storage, where you can throw disks at it, and it will Just Work. With ZFS.

Daisuke · Jan 26, 2012

sgbotsford said:
Mostly because storage is always cheaper next Tuesday. If storage is $80/TB today, in 3 years it will likely be $25/TB.
...
* Incremental is cheaper -- you don't buy storage until you need it, getting a better bang for your buck. Buy your box with two disks. Add disks as you need them, always buying whatever disk has the best bang for your buck that week. Once you get to a full cage, start replacing the smallest disks.

That is a myth thinking you will save 3$ or so per terabyte, if you don't buy disks in bulk now. You will also not save on your computer hardware, TV, stereo, car, appliances, etc. They all break or get old and need updates. Sorry, telling me that buying disks at a later time to save "50 cents" when you spend thousands on other things won't cut it.

Again, stop pushing excuses that don't hold and use a proper array like everyone else who wants to secure their data properly.

jgreco · Jan 30, 2012

TECK, you should be made aware that the functionality that the original poster suggested is actually available in ZFS, though I believe the variation available on the drobo is more sophisticated in some ways, and ZFS won't shuffle already-written data around.

You can add individual disks to a storage pool and they need not be the same size. You can request that ZFS write more than one copy of written data to the storage pool, and within reason, it will endeavour to write the copies to different disks. It is not guaranteed, however.

http://blogs.oracle.com/relling/entry/zfs_copies_and_data_protection

The problem is that this is rarely an efficient solution: sgbotsford is right in many particulars about the convenience it offers and the dangers of getting matched batches of drives, but this completely ignores a different issue. Once you move beyond a few drives, storing two copies of data is more expensive than RAID. Look at a four 1TB drive server, for example. With copies=2, you get 2TB peak storage. You can lose any one drive. If you lose two, you may lose data. With RAIDZ, you get 3TB peak storage and can lose any one drive. With RAIDZ2, you get 2TB peak storage and can lose any two drives. Worse, as the number of drives increases, the storage efficiency of the matched RAID set increases much more rapidly than the copies= solution.

This is basically not done because there are more benefits to RAID and a matched set; most people will prefer to wind up with more usable space in the long run, and will pop for the extra initial expense. It's not required, however.

Daisuke · Jan 31, 2012

jgreco said:
TECK, you should be made aware that the functionality that the original poster suggested is actually available in ZFS, though I believe the variation available on the drobo is more sophisticated in some ways, and ZFS won't shuffle already-written data around.

You can add individual disks to a storage pool and they need not be the same size. You can request that ZFS write more than one copy of written data to the storage pool, and within reason, it will endeavour to write the copies to different disks. It is not guaranteed, however.

Ya. I think he is looking for a solution that allows adding a new disk to an existing ZFS array that will increase the overall array size, the same way Windows Home Server does. IMO, this is a bad usage... but it does not means I'm right. In the past, I experienced the trauma of losing all my important data and that is the reason why I get hyper when I see someone not "caring" about it. :)

sgbotsford · Jan 31, 2012

For the sake of argument consider the 3 disk set in mirror mode. We want to write each block twice.
Also we will chunk the disks into 100 GB partitions. Or perhaps the partition table should be skipped and ZFS handles the chunking internally.

Suppose we start off with an array of 3 disks A,B,C of 1.0 TB, 1.6 TB and 2.0 TB. This means 10, 16, and 20 chunks.
3 eq.
a = shared chunks between A,B
b = shared chunks between A,C
c = shared chunks between B,C

a + b = 10
a + c = 20
b + c = 16 => c = 16 - b

a + b = 10
a + 16 -b = 20

Adding the 2 eq.
2a +16 = 30 => 2a = 14 => a = 7 and b = 3, and c = 13

so
7 chunks are have copies on disk A and disk B
3 chunks have copies on disk A and disk C
13 chunks have copies on disk B and disk C

Mirroring raid on 3 disks.

Upgrade the 1.0 TB disk to a 3.0 TB disk.

Now

a+b = 30
a+c = 20
b+c = 16 => c = 16 -b

substituting

a+b=30
a + 16-b = 20

Adding

2a + 16 = 50
2a = 34
a = 17 b=13 c=3
(we increased the first disk by 20 chunks. a goes up by 10.)

Before: After
a 7 17
b 3 13
c 13 3

Two of the splits will go up. Can't help it. This gives some manoevering room to shuffle. Remember that c is the number of chunks on drives B and C. For it to go down, one of each of those copies moves to the new disk A.

Now lets add a 4th disk to the mix. We'll add a second 3 TB drive as disk D. We could do this the same way, with 4 equations. But there are 6 ways to pair 4 objects. So doing it that way is indeterminate.

So we cheat. A certain amount of space is on all the drives. Split taht evenly.
Our drives are now 3.0 1.6 2.0 and 3.0. 1.6 is the smallest. So 16 chunks of A are shared with B and 16 chunks of C are shared with D. This leaves us with 14, 0, 4, 14 Ignore the 0. 4 is the smallest, and the other 2 are equal. Disk C shares 2 chunks with each disk A and D, Disk A shares 12 chunks with disk D.

Could we have used this ame method before?
A B C = 30 20 16

16 is common to all
Disk C shares 8 with A and 8 with B
Now there is 14, and 6 left over. They don't match.
So no, the subtract the common method works with 4 or more disks, and may get messy with odd numbers.

Let's try. A B C D E = 1.0 1.6 2.0 3.0 3.0 or 10, 16, 20, 30 30 chunks.
10 chunks are common to all. Use up the 10 pairing up with 2 2 3 3 of the others.
This leaves 14 18 27 27 left.
Use up the 14 as 4, 5 5
This leaves 14 22 22
Split 14 between the two
This leaves 15 15
Match them together.

YOU CHEATED! The last two disks were the same size.

Fine A B C D E = 1.0 1.2 2.0 3.0 4.0 = 10 12 20 30 40

Split the 10 as 2,2,3,3 Leaves 10 18 27 37
Split the 10 as 2,3,5 Leaves 16, 24, 32
Now we can either do the 3 eq bit, or just look and say, how do I split 16 with a difference of 8. 4 and 12.
So we split 16 as 4, 12 This leaves 20 and 20, which we then match up.

Daisuke · Jan 31, 2012

Personally, I admire your passion and seek of knowledge. However, I still think you channel your energy into a subject that is overlooked and probably discarded by many developers from the start. I might be wrong, but I would not spend all this time to re-invent the wheel. Instead, I would choose a proven solution that just works and works very well.

Cheers

jgreco · Feb 1, 2012

It's fine as a mathematical game or puzzle, but a real world implementation is often more difficult and more challenging than the idea it is based on. ZFS's design certainly allows you to add more disks to your pool as you go, and you can expand your filesystem size in this manner. The problem is that you end up needing to use weaker forms of data protection, and what happens when recovery is required can be ambiguous if conditions were not ideal when data was written ... let's say, like when your old pool was 95% full and you added a single huge disk, copies of most new data will both tend to get written to the new drive, and then lose the new drive. Ooooooops.

This could be solved by someone writing a tool to cause ZFS to rebalance and assert current policy onto an existing pool, but no one's bothered that I know of, because most people don't see a ton of value in this style of data protection. For better or for worse, most serious consumers of the technology are using RAIDZ in some form for data protection, because it's straightforward, easily supported, and trivial to understand.

Important Announcement for the TrueNAS Community.

The answer is no. What was the question???

sgbotsford

Cadet

sgbotsford

Cadet

Daisuke

Contributor

sgbotsford

Cadet

Daisuke

Contributor

sgbotsford

Cadet

Brand

Moderator

sgbotsford

Cadet

Daisuke

Contributor

jgreco

Resident Grinch

Daisuke

Contributor

sgbotsford

Cadet

Daisuke

Contributor

jgreco

Resident Grinch

Similar threads

Important Announcement for the TrueNAS Community.

The answer is no. What was the question???

Cadet

Cadet

Contributor

Cadet

Contributor

Cadet

Moderator

Cadet

Contributor

Resident Grinch

Contributor

Cadet

Contributor

Resident Grinch

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "The answer is no. What was the question???"

Similar threads