Forcing Record Size, Causing Space Issue

Status
Not open for further replies.

onigiri

Dabbler
Joined
Jul 17, 2018
Messages
32
So I noticed just recently that when I forced the record size from 'inherit' to 8k, to obtain a minor performance increase (nominal at best) that all of the sudden, the space being used was exactly double what it should be.

For Instance, 2 folders, 1 is 13TB and the other 14TB, each have 20k files, on one of my Freenas boxes (with record size 'inherit') the size matches the file, but on the freenas box with a forced record size, the images are exactly double.

Even though I am sure it is the record size causing this, can someone explain why? Keep in mind most of my folders on my two boxes are 10k+ files, usually 1/2 1.5G image files and the other 1/2 1kb files.

Here is a screenshot :

XzCZ2DC.jpg
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
So I noticed just recently that when I forced the record size from 'inherit' to 8k, to obtain a minor performance increase (nominal at best)

Hey @onigiri! I remember that other thread where you needed to assess what you'd inherited.

How come you've changed recordsize to 8K on this dataset? This one should have a large recordsize because of the files being huge - the 8K recordsize should be for your GeoCue/SimActive database output, not the source files.
 

onigiri

Dabbler
Joined
Jul 17, 2018
Messages
32
Unfortunately, they are all in the same dataset. /mnt/Tank/Operations/....

The output of the SimActive files are 'orthorectified' and 'Mosaic' Images, which are the same size or more.
The output of the GeoCue files are .las (classification) files again, also very large image files.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Unfortunately, they are all in the same dataset. /mnt/Tank/Operations/....

The output of the SimActive files are 'orthorectified' and 'Mosaic' Images, which are the same size or more.
The output of the GeoCue files are .las (classification) files again, also very large image files.

I thought there was a "file database" in question that was accessed with random patterns, that ArcGIS was using? If this is only used during the "processing" phase then that's the one that needs a separate recordsize=8K dataset.

The input and output can share the same dataset (as long as that isn't bottlenecking performance) and the recordsize should definitely be 128K or even larger given the 1GB+ files used.
 

onigiri

Dabbler
Joined
Jul 17, 2018
Messages
32
Yea, all of the files are accessed randomly. A production staff member opens a project, that project can consist of 1000's of images, but only the metadata is referenced in the application, but when the staff member edits the images, that data is then read, altered, and written.

This happens in all 3 departments.
Photogrammetry: Mosaic Tile Sets (10-20k images split into blocks(tiles))
LiDAR: GeoCue (classifications)
GIS: Drawing of objects on top of the lidar classifications or mosaic tilesets, such as communication towers, power lines, rivers, roads, etc.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
As I recall this is a quirk of SMB and block size calculating. If you check your dataset consumed space, it should be closer to what your expecting. Again, this is a reporting "bug" and not actual used space.
 

onigiri

Dabbler
Joined
Jul 17, 2018
Messages
32
actually I did check that. Here is the output directly from shell, one box vs the other. Same contents:

Tqwepfq.jpg
- 8k

vQ9SvTz.jpg
- Inherited (128k)
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Yea, all of the files are accessed randomly.

In this case, "random" refers to the data access at the ZFS record level. A user might be opening a "random" 1.5G file from their perspective, but from ZFS's level they are opening a sequential set of records that comprise that 1.5GB file.

If they were reading random chunks of that that 1.5G file (eg: read 8KB at offset 1234, write 8KB at offset 5678, etc) this would be more akin to "random access" and might be more of what the ArcGIS "file database" is doing, since it's overlaying points of interest which are likely just stored as a small amount of GPS coordinates and a field indicating the type of object (communication tower, power line, etc)
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
actually I did check that. Here is the output directly from shell, one box vs the other. Same contents:

Tqwepfq.jpg
- 8k

vQ9SvTz.jpg
- Inherited (128k)
Odd. Does the dataset itself reflect the same?
 
Status
Not open for further replies.
Top