I am designing a system capable of working with 15 million (and growing) image files ranging from 100kb to 10mb. After some preliminary research, our first choice is ZFS on FreeNAS. I was hoping I could get the the communities plan on our proposed hardware and ZFS configuration. We recognize we're dealing with a few non-optional/odd configurations/requirements.
Disclaimer: I'm a Software Engineer; I typically do not work on the filesystem level so I apologize in advanced if anything mentioned below is woefully inaccurate in relation to ZFS.
Environment:
Concerns:
High Level Configuration
Based on my introductory research I think this may be a good starting point. As I mentioned above, I don't typically work at the filesystem level so I apologize if I am woefully inaccurate on anything below:
I would love to hear the communities feedback/suggestion on our above proposal. I am happy to provide any additional information that may be needed.
Thank you!
[1]
Example directory structure - none of the directory/filenames are normalized in any way.
Disclaimer: I'm a Software Engineer; I typically do not work on the filesystem level so I apologize in advanced if anything mentioned below is woefully inaccurate in relation to ZFS.
Environment:
- Concurrent Connections: This is an internal application that will only be utilized by a few folks. It's safe to say there will never be more than 5 concurrent connections accessing this data. In most (95%) cases, only one or two users may be accessing this data concurrently.
- Network: This data will only be accessed internally on a 1GbE network, though a 10GbE network is in the works for systems that will read this data.
- Hardware: The hardware we have allocated for this project is a mixture of older enterprise grade hardware and consumer grade storage (great combination, I know).
Dell R720xd w/ 24x 2.5” bays
RAM: 128GB RAM (more can be allocated if needed)
CPU: 2x E5-2620 @ 2.20GHz
Storage:
8x2TB SSDs local storage (Crucial MX500)
1x500GB SSD for OS and Database (Crucial MX500)
RAID: H310 (IT Mode)
- In almost all cases, the data will be permanently stored on the drive after the initial write.
- The data will not be modified (edited, compressed, resized, etc) after the initial write.
- The directory structure of the data is non-optimal [1], due to the design of the application pulling this data, it is more or less immutable.
- The data should be read optimized which includes, but may not be limited to: random/sequential reads, directory listings, etc.
- There will be new images written to the file structure on a fairly regular basis, but the write performance is not much of a concern.
- Data will be processed (image hashing, facial recognition vectors, etc) and stored in a MySQL database via a few Docker Containers on the same box to maximize processing performance.
- Data will be read/consumed by the user via SMB (windows) or NFS (linux) mounts.
- Data will be written via SMB (windows) or NFS (linux) mounts.
- There is a significant number of identical files (conservative estimate is 20%, but it could be quite more), due to the design of the application pulling this data we cant delete the filenames.
- We have about 10TB worth of data currently.
- We do not intend for this data to grow rapidly, maybe 1TB a year.
- We typically do a full directory backup of the data every month.
Concerns:
- Since we're planning on using consumer grades SSDs (unless an argument can be made to use 2.5" spinners) we're concerned about the longevity of the drives (particularly with metadata thrashing).
- Is the hardware sufficient for this use case?
- We'd like to dedupe the data if its feasible (more on this in the configuration section below) if it will result in significant space savings.
High Level Configuration
Based on my introductory research I think this may be a good starting point. As I mentioned above, I don't typically work at the filesystem level so I apologize if I am woefully inaccurate on anything below:
- 1 pool
- 1 raidz1 vdev (8x2TB SSD) - this may eventually expand to 3 raidz1 vdevs of the same size (24 bay enclosure)
- compression=on - with lz4 compression, if this isn't enabled by default
- ashift=9 (4k block size)
- dedupe=on may be a pipe dream though. With 128GB RAM I think I can make this work (assuming 5GB metadata per 1TB storage). I may be better off writing a script that will simply create (potentially millions) of hard links for identical images instead.
- sync=disabled from what I understand, disabling sync effectively disables the SLOG which means I may lose up to 5 seconds (configurable?) worth of new data being copied in the event of a hardware failure. If this is the case, I am OK with a few seconds of new data loss as long as existing data on the system is safe.
- no SLOG - see sync=disabled explanation above
- no ZIL
- no L2ARC - I don't see any reason why this would be needed with SSDs?
I would love to hear the communities feedback/suggestion on our above proposal. I am happy to provide any additional information that may be needed.
Thank you!
[1]
Example directory structure - none of the directory/filenames are normalized in any way.
Code:
+ root directory 1 - sub directory 1 - image 1 - image 2 - image 3 - ... - image n (where n is between 1 and 1,000+) - sub directory 2 - image 1 - image 2 - image 3 - ... - image n .... - sub directory n (where n is between 1,000 and 30,000) - image 1 - image 2 - image 3 - ... - image n + root directory 2 + ... + root directory 15