Hello there, this is my first build and my first contact with NAS. I researched and RTFM as I could til now. (And so on :D)
I am working with a group of Machine Learning researchers. We exist because Data exists. They are tons of unstructured data for each proyect and a lot of people + tools to share terabytes with.
I am tired of sharing HUGE .zip files with my colleagues just to modify a Dataset and forward Tb of .zips to another colleague. Leading to an enless path of parallel data crash.
So now I am researching for a proper MLOps shared pipeline for our small group of scientist, in our small office. Pachiderm, Determined Ai, Label Studio server, and most importantly a huge S3 storage to push ~1999999999999 weird photos.
I am not paying AWS for testing S3 buckets in our office so I decided to repurpose my old little workstation for our lab.
This is my build right now:
Truenas Scale
Intel i5 4590s
8Gb RAM DDR3 1600mhz single channel (I know, I am looking for at least 32Gb dual channel)
2x5Tb Seagate Barracuda | Mirrored 1Vdev ZFS
1x32Gb SSD for boot Truenas
1x480Gb repurposed SSD for ZIL
Asrock B85 M-itx with integrated Gigabit
Nginx Proxy Manager on Raspberry Pi 4 8gb for reverse proxy
Our preference way to store and share Data is S3 compatible storage.
So I created a Pool with mirrored 2x5Tb Barracuda and add the SSD as intent log cache for redundancy as it has lower latency for sinchronous write transactions becuase we usually write A LOT of tiny files and those kind of uploads are usually terrible. Then I enabled "always" sync on the default dataset.
Next I created a dataset for Minio as S3 compatible service and setup nginx proxy manager on our Raspberry pi to use our custom domains to point to our NAS and its services securely.
Everything seems to working properly with https.
But performance is crap. I mean my build is bad it is outdated and unbalanced but with its downsides in mind, I cant go further than uploading to minio buckets at ~3MB/s. I can't believe the bad performance is that bad. Where is my bottleneck? Yes, RAM is my first bottleneck but at least my free ram should leave me upload at a good speed until it is full, and then throttle. But my top speed is like 5MB/s decreasing to consistent 3MB/s. over the RAM increas of usage.
For transfers, I tried Minio mc client, Cyberduck and S3 Browser with the same results.
Does anyone could point me to the right direction? I know my build is not right but I tried to test it as is so now I can buy some hardware just to get enought performance for this build without getting that bottlenecked. I want to make the proper hardware/software changes to achieve the performance the disks I installed on should handle. I know it could run powerful enough on the cheap!
Thanks you for your suggestions.
I am working with a group of Machine Learning researchers. We exist because Data exists. They are tons of unstructured data for each proyect and a lot of people + tools to share terabytes with.
I am tired of sharing HUGE .zip files with my colleagues just to modify a Dataset and forward Tb of .zips to another colleague. Leading to an enless path of parallel data crash.
So now I am researching for a proper MLOps shared pipeline for our small group of scientist, in our small office. Pachiderm, Determined Ai, Label Studio server, and most importantly a huge S3 storage to push ~1999999999999 weird photos.
I am not paying AWS for testing S3 buckets in our office so I decided to repurpose my old little workstation for our lab.
This is my build right now:
Truenas Scale
Intel i5 4590s
8Gb RAM DDR3 1600mhz single channel (I know, I am looking for at least 32Gb dual channel)
2x5Tb Seagate Barracuda | Mirrored 1Vdev ZFS
1x32Gb SSD for boot Truenas
1x480Gb repurposed SSD for ZIL
Asrock B85 M-itx with integrated Gigabit
Nginx Proxy Manager on Raspberry Pi 4 8gb for reverse proxy
Our preference way to store and share Data is S3 compatible storage.
So I created a Pool with mirrored 2x5Tb Barracuda and add the SSD as intent log cache for redundancy as it has lower latency for sinchronous write transactions becuase we usually write A LOT of tiny files and those kind of uploads are usually terrible. Then I enabled "always" sync on the default dataset.
Next I created a dataset for Minio as S3 compatible service and setup nginx proxy manager on our Raspberry pi to use our custom domains to point to our NAS and its services securely.
Everything seems to working properly with https.
But performance is crap. I mean my build is bad it is outdated and unbalanced but with its downsides in mind, I cant go further than uploading to minio buckets at ~3MB/s. I can't believe the bad performance is that bad. Where is my bottleneck? Yes, RAM is my first bottleneck but at least my free ram should leave me upload at a good speed until it is full, and then throttle. But my top speed is like 5MB/s decreasing to consistent 3MB/s. over the RAM increas of usage.
For transfers, I tried Minio mc client, Cyberduck and S3 Browser with the same results.
Does anyone could point me to the right direction? I know my build is not right but I tried to test it as is so now I can buy some hardware just to get enought performance for this build without getting that bottlenecked. I want to make the proper hardware/software changes to achieve the performance the disks I installed on should handle. I know it could run powerful enough on the cheap!
Thanks you for your suggestions.
Last edited: