SOLVED Fragmentation on new pool - DONE for the moment

Status
Not open for further replies.

AVB

Contributor
Joined
Apr 29, 2012
Messages
174
I had a ten 3TB drive Raid Z2 pool that consistently sat at about 12% fragmented. I upgraded to two 8 drive RaidZ2 Vdevs in the pool to gain a little space and because I had a few extra 3TB drives and it was cheaper to buy 3 more than buying 10 new 5TB drives. The install went fine, everything is working as before and now I am reloading the data back into the pool. I have 21.5TB worth of data and have reloaded about 18TB of it and my fragmentation is showing 30%

I thought with a continuous reload of data that there would be little if any fragmentation. My questions are:

1. Should I be concerned? If the answer is no then the next 2 questions don't really matter.
2. Is there something I should be looking at that could be causing a problem?
3. If, at a later date, I decide to reduce the fragmentation what should I do differently?
 

zambanini

Patron
Joined
Sep 11, 2013
Messages
479
without more information about your hardware and your workload we need a magic glass bowl.
 

AVB

Contributor
Joined
Apr 29, 2012
Messages
174
I expect nothing less from you guys :)

16 3TB drives a mixture of Seagate (10) and Toshiba (6) all the Toshibas are in one Vdev. Drives are connected to two Dell H310's flashed to IT mode, one 8 drive RaidZ2 Vdev to each. Mirrored SSD boot drives connected to motherboard. AMD 6 core CPU, 16GB Ram, Intel GB network card.

Backup machine is a Win 10 machine with 2 raid systems , six 4tb drives off another Dell H310 (not flashed) and five 3TB drives, off the motherboard - both Raid 5. This is a straight copy (using Richcopy) not any sort of backup program in this particular case.

There is no other usage of the Media Server or the backup machine during this restore

Lastly, restore speeds are running 1TB every 3 hours (90-100 MB/s) pretty consistently. The limitation being the 1GB network.

If you need anything else just let me know what and I'll try to provide.

Update: Restore finished with 33% Fragmentation.

without more information about your hardware and your workload we need a magic glass bowl.
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
33% seems excessive. Can you be a bit clearer when explaining the workload? I understand you're backing up other computers to the server - what backup solution are you using?
 

AVB

Contributor
Joined
Apr 29, 2012
Messages
174
In this case I'm not using a "solution" per se, it was a straight copy with verification. The reason is/was that I could actually see file by file what is being copied and that I could stop it and know exactly where I was. For reasons I won't get into here I didn't let this run unattended.

As far as workload, this is a media server that streams everything from blu-ray iso's to flac audio to up to 3 HTPCs over a gigabyte wired network in the house. However, during this process it didn't do anything besides reload data from the backup.

Since the process is finished and I'm not going to reload everything again - at least not anytime soon, I'll monitor the fragmentation and if it gets worse I might be forced into finding a solution. I was hoping there might be something obvious I missed or a quick and dirty solution for next time. I'm going to re-access the backup solutions I have available to me and see what might fit my needs/space/budget.


Thank you for your help
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,175
gigabyte wired network
Gigabit. There's an 8x difference. ;)

In any case, I am surprised by the fragmentation number, given the workload...
 

stilgars

Dabbler
Joined
Mar 29, 2014
Messages
12
I am also surprised by my fragmentation report following a pool upgrade (replacing all 4 TB drives by 6 TB ones).

Before, my FRAG figure showed 13% fragmentation with the old pool. I started a one by one online resilvering upgrade, but finally decided to destroy and rebuild the pool using my backups to get rid of the fragmentation, and I finally hit 27% fragmentation instead!

The process to move data was the following: I did a zfs send -R oldtank@manualsnapshot to the backup pool, and then sequentially restored all my six datasets
(zfs send backup/dataset1@manualsnapshot | zfs receive newtank/dataset1
&&
zfs send backup/dataset2@manualsnapshot ....

&& ...
... | zfs receive newtank/dataset6).

I don't really understand how could my fragmentation explode that way.
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
The fragmentation number isn't what you think it would be and is more of an "algorithmically inferred" number that is talking about the ease of allocating free space. A CoW filesystem usually tends towards poor "traditional fragmentation" because of the block allocation strategy and need to support things like snapshots; it is impossible to calculate fragmentation without traversing the pool and actually mapping everything out.

Don't get too OCD over it. It's mostly a not-that-useful number.
 

stilgars

Dabbler
Joined
Mar 29, 2014
Messages
12
The fragmentation number isn't what you think it would be and is more of an "algorithmically inferred" number that is talking about the ease of allocating free space. A CoW filesystem usually tends towards poor "traditional fragmentation" because of the block allocation strategy and need to support things like snapshots; it is impossible to calculate fragmentation without traversing the pool and actually mapping everything out.

Don't get too OCD over it. It's mostly a not-that-useful number.

ok thank you
 

SirMaster

Patron
Joined
Mar 19, 2014
Messages
241
Part of the phenomenon you're experiencing is called the "Intent Log Checkerboard". What you should have done here is change the "logbias" setting for the dataset in question to "throughput" instead of "latency". This will aggregate writes much more efficiently.

Additionally, if you're copying a bunch of files at once, what's going to happen is they will all aggregate into the same Transaction Group (TXG), and be synced at the same time all together. If they are all files that are much larger than can be encompassed within a given TXG, then yeah, you're going to get data checkerboarding.

Suggestions:
  1. Use a 1mbyte recordsize instead of 128kbyte.
  2. Write files in serial, not parallel.
  3. Improve how the ZIL works for these kinds of writes by setting "logbias" for your target to "throughput".
  4. Extend your transaction group interval based upon how much data you're willing to store in an in-RAM TXG and on-disk mirror of the TXG that we call the "ZIL". On a typical home system with say 16GB RAM and throughput of perhaps only a few hundred megabytes per second, maybe you can make it 10 or 15 seconds instead of 5 to reduce the sync intervals and enhance linear aggregation.
  5. Realize that fragmentation is not a particularly big deal in ZFS. With very large files, your metaslabs are going to be fairly teeny and can load and unload dynamically, while with FAT, NTFS, and other legacy filesystems a big part of dealing with a fragmented filesystem was gigantic growth of your space maps that had to be kept in-memory; this resulted in profound impact not only to seek times to to overall system resource utilization. That cost doesn't become profound on ZFS until your space maps accommodate billions++ of files...
 

stilgars

Dabbler
Joined
Mar 29, 2014
Messages
12
Part of the phenomenon you're experiencing is called the "Intent Log Checkerboard". What you should have done here is change the "logbias" setting for the dataset in question to "throughput" instead of "latency". This will aggregate writes much more efficiently.

Additionally, if you're copying a bunch of files at once, what's going to happen is they will all aggregate into the same Transaction Group (TXG), and be synced at the same time all together. If they are all files that are much larger than can be encompassed within a given TXG, then yeah, you're going to get data checkerboarding.

Suggestions:
  1. Use a 1mbyte recordsize instead of 128kbyte.
  2. Write files in serial, not parallel.
  3. Improve how the ZIL works for these kinds of writes by setting "logbias" for your target to "throughput".
  4. Extend your transaction group interval based upon how much data you're willing to store in an in-RAM TXG and on-disk mirror of the TXG that we call the "ZIL". On a typical home system with say 16GB RAM and throughput of perhaps only a few hundred megabytes per second, maybe you can make it 10 or 15 seconds instead of 5 to reduce the sync intervals and enhance linear aggregation.
  5. Realize that fragmentation is not a particularly big deal in ZFS. With very large files, your metaslabs are going to be fairly teeny and can load and unload dynamically, while with FAT, NTFS, and other legacy filesystems a big part of dealing with a fragmented filesystem was gigantic growth of your space maps that had to be kept in-memory; this resulted in profound impact not only to seek times to to overall system resource utilization. That cost doesn't become profound on ZFS until your space maps accommodate billions++ of files...

Very interesting, I just want to point out I copied my file in a serialised manner as all 6 datasets were sequentially copied. But the rest of your suggestions were enlightening, thank you - I will put 3. and 4. in good use next time - as I also noted it is not such a big deal anyway.
 
Joined
Jan 15, 2015
Messages
25
Sorry for my delayed question.
I use ISCSI with ZVOL and I'm seeing 65% numbers of fragmentation.
What can I do in this case?
What is best way to do it?
I'm using it because I use Windows 2012 R2 and our switch don't works with LAGG so I use MPIO with ISCSI to get better performance and redundancy with 1Gbit/s Ethernet.
I really appreciate any suggestion. Our main use is with IP Surveillance Software, with 200 IP Cameras simultaneously.
I tried CIFS, but can't get works with 2 ethernet links, like we can do in SMB 3.0.
I not tried with NFS, because like our switch don't works with LAGG I can't get redundancy and performance....but if you can give me some help I can change everything.
Our Storage is working with 14 x 3TB NAS HD on one pool with 2 VDEV 7 HD's RAIDZ2, 32TB ECC.

Sorry for my poor english....I'm from Brazil.
 
Status
Not open for further replies.
Top