Lowest Frag contest [Phase I closed, tests underway]

Status
Not open for further replies.

tc60045

Dabbler
Joined
Nov 22, 2016
Messages
25
Fragmentation on ZFS doesn't matter -- until it does. And for me and my current pool / setup, it actually doesn't matter much.

I have an existing pool that is fragmented and a new pool that is empty. The ZFS mythos says that a send / copy will reduce fragmentation. In my case, though I am facile with the commands and process, that myth is just that, and the new "copy" ends up 2x as fragmented as the source.

Because fragmentation comes up so often, it seems to me to make sense to try a number of approaches that might lay bare what does and does not work in reducing it via the only tools we really have: new blank drives and the bash prompt. Let's be scientific. Pose hypotheses. I'm happy to supply the petri dish and report back on results.

I made this a contest just to have a little fun and contribute results back to the community for others to point to, a la: "[contest entrant] proved that ______ doesn't work when you are migrating to a new pool -- your fragmentation may go up and not down"


Specifics:

I will run up to ten (10) tests, executing user-suggested commands from the community. I will award a $50 Amazon Gift Card to the person who is able to lower fragmentation on my new pool by the largest amount AND as a token of thanks to iX Systems for FreeNAS and for hosting great conversations on this forum, I will donate $50 to iX systems in whatever manner they deem acceptable. Oh, and bragging rights to the winner, too!

Phase 1: indicate that you are interested, asking whatever questions you need to know about the system that I haven't given. This phase will end when we hit around 20 or so interested people, and I will list the top 10 players participating, using a combination of 2500 - (100 * queue position) + number of forum posts, and sorting those in descending order. [note: still open as of Nov 22 @ 1934 GMT]

Phase 2: Each player will describe the configuration settings I am to set, run and execute. I will run each player's test, reporting out the results.

Phase 3: If there is sufficient lobbying / groaning about a scenario missing, I will do a couple of last runs. Then I will award a prize. I reserve the right to award early if someone just nails it.

Setup: [last edited Nov 28 @9:42 GMT; space got a little tighter on source pool but not worried]
- SuperMicro 12-bay SAS enclosure
- SuperMicro X8DTN mobo
- Avago 9211-8i HBA, in IT mode; capable of 6Gbps
- 2x Intel Xeon X5670 processors
- 96 GB ECC RAM
- Ubuntu 16.04.1 with ZFS native, kernel-modded, not FUSE (why am I here? Because ZFS on Ubuntu is nascent. Someday I will convert to FreeNAS)
- SLOG available, but not in pool (HyperX® Predator PCIe 240GB SSD with speeds of up to 1400MB/s read and 1000MB/s write)
- Existing pool:
- 6x 4TB in Raidz2, unencrypted on 6Gbps SAS drives (6Gbps is limit of backplane)
- Lots of big media files
- 84% of capacity utilized (I will use a common snapshot for all tests)
- I had a slog in place for first 8TB or so, then removed it
- pool is currently -- and has *always* -- been about 22% fragmented, ever since rsyncing files to it from an old NAS
- compression is on for all datasets, but is 1.00 for all but one, where it is 1.03
- I have not tweaked logbias nor anything else on the pool
- Drives are Seagate 7200rpm (ST4000NM0023)
- recordsize is 128K [thanks SweetAndLow]

Goal:
- Copy all files to new pool, which is 6x 4TB (identical size)
- New pool is encrypted using luks. Given AES-NI, there is no expected performance hit. It is like GELI, but what one has to use on Ubuntu.
- [update 11/28: encryption process does not make fragmentation any worse than when using bare drives directly]
- Drives are slightly better models (ST4000NM0034) but ostensibly the same

Baseline:
- command: zfs send -R tank1@recursive_shot_112116 | pv -s 12T | zfs receive tank22
- results: sudo zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
tank1 21.8T 18.0T 3.79T - 22% 82% 1.00x ONLINE -
tank22 21.8T 17.9T 3.81T - 49% 82% 1.00x ONLINE -

I'm a lurker to the community, so if this contest rubs people the wrong way and ends up being a horrible idea, I'll just politely tip my hat and slink away. But, to reiterate, I thought that my unique, time-isn't-an-issue, identical source / target situation might allow me to run tests to benefit us all.

Lastly: this contest is a private contest, not sponsored or endorsed in any way by iXsystems nor FreeNAS nor any commercial entity. iXsystems may at their sole discretion stop this contest for any reason [I certainly hope that they do not]. All test results published here become the property of iXsystems.

Best,

TC
 
Last edited:

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
I'm interested and I have a question: Are we limited to replication or can we suggest cp/rsync/... commands ?
 

darkwarrior

Patron
Joined
Mar 29, 2015
Messages
336
Hi,
I actually would be interested in that too. :smile:
My freespace frag is currently at 21% and growing on my pool with 2 zVols ...
 

tc60045

Dabbler
Joined
Nov 22, 2016
Messages
25
I'm interested and I have a question: Are we limited to replication or can we suggest cp/rsync/... commands ?

Whatever you like -- this is a community experiment, and those benchmarks are essential -- I agree. I'd only ask that you specify all the parms / scriptify it, as everyone rsyncs a little differently ;) Fair enough?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
I don't see a problem with this, but iX would probably appreciate a disclaimer in the OP along the lines of "I am not connected to iXsystems and they are not promoting this informal contest of mine".
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Ok ;)

What datasets do you want to copy? and where are they located on the pool?
 

tc60045

Dabbler
Joined
Nov 22, 2016
Messages
25
Hi,
I actually would be interested in that too. :)
My freespace frag is currently at 21% and growing on my pool with 2 zVols ...

Great, stay tuned!

I don't see a problem with this, but iX would probably appreciate a disclaimer in the OP along the lines of "I am not connected to iXsystems and they are not promoting this informal contest of mine".

Excellent call, Eric -- will do so!

Ok ;)

What datasets do you want to copy? and where are they located on the pool?

Biduleohm, I want to copy everything over (and have now done so several times -- with results that vary but never improve upon the underlying fragmentation %). I'm going to keep those results to myself for now, as I'm not allowed to enter this fracas, but I will post them later. I am happy to post the results of 'zfs get all.'
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
What is your block size set to? Default is 128k I think. Copying all your data around probably won't affect fragmentation but making your block size smaller will help. You will lose some performance though.

Sent from my Nexus 5X using Tapatalk
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Biduleohm, I want to copy everything over (and have now done so several times -- with results that vary but never improve upon the underlying fragmentation %). I'm going to keep those results to myself for now, as I'm not allowed to enter this fracas, but I will post them later. I am happy to post the results of 'zfs get all.'

The thing is you can't copy a dataset, only the content (you cannot do cp /mnt/tank1/* for example), that's why I was asking.

What is your block size set to? Default is 128k I think. Copying all your data around probably won't affect fragmentation but making your block size smaller will help. You will lose some performance though.

Sent from my Nexus 5X using Tapatalk

But it'll likely get more overhead and he's already at 82 % full so it can get messy...
 

tc60045

Dabbler
Joined
Nov 22, 2016
Messages
25
A few updates -- sorry but was away from Internet for a bit, with Thanksgiving and family.
- Prizes now upped to a total of $100: $50 for winner, and $50 contribution to iX
- I've been testing some options behind the scenes, and one critical one came back with the result I hoped: encryption of the drives is not driving the fragmentation. Creating a pool with straight SAS drives versus LUKS encrypted volumes made no difference.
- I've mapped out the fragmentation as the data copy over, for what it is worth, as depicted below

Answers back:
- SweetAndLow: recordsize is indeed 128K; turn your thoughts into an entry and you can test it in the contest ;)

- BiduleOhm: to be more clear, if your contest entry involves creating new datasets on tank22 and using cp to copy data over from one dataset on tank1 to new dataset on tank22, I am fine with that. It will take mere moments to create new datasets on the new pool.

Picture 2016-11-28 at 9.17.57 AM.jpg
 
Last edited:

darkwarrior

Patron
Joined
Mar 29, 2015
Messages
336
Oh dear, jumping up to 49% with a simple zfs send | receive !? outch :eek:
I was thinking it would be doing a simple sequential copy and therefore creating a big chunk of data ...
 

tc60045

Dabbler
Joined
Nov 22, 2016
Messages
25
Ok ;)

What datasets do you want to copy? and where are they located on the pool?

Bidule0hm, see picture below. LXD datasets are annoying and tiny; tank1/media is the big baddie, and it is listed last


Oh dear, jumping up to 49% with a simple zfs send | receive !? outch :eek:
I was thinking it would be doing a simple sequential copy and therefore creating a big chunk of data ...

There are ways to chunkify, darkwarrior, though contest decorum won't allow me to say more. I have spent some time pondering the bursti-ness of the read/write process on the same machine and wondering how a test or two might expose this.... All I can say :)
 

Attachments

  • Picture 2016-11-28 at 9.40.24 AM.jpg
    Picture 2016-11-28 at 9.40.24 AM.jpg
    83.2 KB · Views: 332

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
to be more clear, if your contest entry involves creating new datasets on tank22 and using cp to copy data over from one dataset on tank1 to new dataset on tank22, I am fine with that. It will take mere moments to create new datasets on the new pool.

Yep, that's what I was thinking.

Bidule0hm, see picture below. LXD datasets are annoying and tiny; tank1/media is the big baddie, and it is listed last

I think it's the wrong picture.
 

tc60045

Dabbler
Joined
Nov 22, 2016
Messages
25

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
jumping up to 49% with a simple zfs send | receive !? outch :eek:
I was thinking it would be doing a simple sequential copy and therefore creating a big chunk of data ...
The FRAG column is not what you think it is. In particular, it is not a measure of how fragmented the data is, but rather, how fragmented the free space is.
 

tc60045

Dabbler
Joined
Nov 22, 2016
Messages
25
The FRAG column is not what you think it is. In particular, it is not a measure of how fragmented the data is, but rather, how fragmented the free space is.

True, and that is why I am not worried about it.

It is strange to see free space fragmentation so much higher on a destination than on the source. I'd love to see if mitigation is possible, even whilst we agree that mitigation is not (likely even) necessary.
 

darkwarrior

Patron
Joined
Mar 29, 2015
Messages
336
The FRAG column is not what you think it is. In particular, it is not a measure of how fragmented the data is, but rather, how fragmented the free space is.

I'm aware that the frag counter is for the free space on the pool. ;)
But this does not make it less scary ... on the contrary ...
I would have expected that the zfs snapshot replication will be written in one "stream" / "chunk", but I guess I'm wrong :confused:
We are learning things every day. That's also what I'm here for :)

True, and that is why I am not worried about it.

It is strange to see free space fragmentation so much higher on a destination than on the source. I'd love to see if mitigation is possible, even whilst we agree that mitigation is not (likely even) necessary.

Personally, I don't like the idea of having a lot of fragmented freespace. Because it means, that when filling slowly the pool writing will involve a lot more seeks and searchs for free space (obviously you might say :cool:)
But it also means that the data is not stored in contigous blocks, thus implying potentially a hell lot of seeks to read files ... o_O
Making it especially very ugly for block storage ...
 

tc60045

Dabbler
Joined
Nov 22, 2016
Messages
25
OK, gang, we can't wait forever, so I'm going to end phase 1 at 10am GMT on Thursday, Dec 1st. If you haven't posted here with an "I'm in" then you're out of the running for the prize.

Currently competing:
- BiduleOhm
- SweetAndLow
- Darkwarrior

Maybe / maybe not -- can I get a confirmation?
- Robert Trevellyan
- Ericloewe

What is the difference between a sideline theoretician and a computer scientist? The former believes that never being proved wrong makes them right; the latter realizes that being wrong is merely the process of discovering what is right. Get in the game!
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
Status
Not open for further replies.
Top