DIY Truenas Beginner for Machine Learning Datasets | mirror 2x5Tb disks

mrcbns

Dabbler
Joined
Mar 24, 2022
Messages
10
Hello there, this is my first build and my first contact with NAS. I researched and RTFM as I could til now. (And so on :D)

I am working with a group of Machine Learning researchers. We exist because Data exists. They are tons of unstructured data for each proyect and a lot of people + tools to share terabytes with.
I am tired of sharing HUGE .zip files with my colleagues just to modify a Dataset and forward Tb of .zips to another colleague. Leading to an enless path of parallel data crash.

So now I am researching for a proper MLOps shared pipeline for our small group of scientist, in our small office. Pachiderm, Determined Ai, Label Studio server, and most importantly a huge S3 storage to push ~1999999999999 weird photos.

I am not paying AWS for testing S3 buckets in our office so I decided to repurpose my old little workstation for our lab.

This is my build right now:

Truenas Scale

Intel i5 4590s
8Gb RAM DDR3 1600mhz single channel (I know, I am looking for at least 32Gb dual channel)
2x5Tb Seagate Barracuda | Mirrored 1Vdev ZFS
1x32Gb SSD for boot Truenas
1x480Gb repurposed SSD for ZIL
Asrock B85 M-itx with integrated Gigabit

Nginx Proxy Manager on Raspberry Pi 4 8gb for reverse proxy


Our preference way to store and share Data is S3 compatible storage.
So I created a Pool with mirrored 2x5Tb Barracuda and add the SSD as intent log cache for redundancy as it has lower latency for sinchronous write transactions becuase we usually write A LOT of tiny files and those kind of uploads are usually terrible. Then I enabled "always" sync on the default dataset.

Next I created a dataset for Minio as S3 compatible service and setup nginx proxy manager on our Raspberry pi to use our custom domains to point to our NAS and its services securely.
Everything seems to working properly with https.

But performance is crap. I mean my build is bad it is outdated and unbalanced but with its downsides in mind, I cant go further than uploading to minio buckets at ~3MB/s. I can't believe the bad performance is that bad. Where is my bottleneck? Yes, RAM is my first bottleneck but at least my free ram should leave me upload at a good speed until it is full, and then throttle. But my top speed is like 5MB/s decreasing to consistent 3MB/s. over the RAM increas of usage.
For transfers, I tried Minio mc client, Cyberduck and S3 Browser with the same results.

Does anyone could point me to the right direction? I know my build is not right but I tried to test it as is so now I can buy some hardware just to get enought performance for this build without getting that bottlenecked. I want to make the proper hardware/software changes to achieve the performance the disks I installed on should handle. I know it could run powerful enough on the cheap!

Thanks you for your suggestions.
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
1x480Gb repurposed SSD for ZIL
So I created a Pool with mirrored 2x5Tb Barracuda and add the SSD as intent log cache for redundancy as it has lower latency for sinchronous write transactions becuase we usually write A LOT of tiny files and those kind of uploads are usually terrible. Then I enabled "always" sync on the default dataset.
I can't believe the bad performance is that bad. Where is my bottleneck?
It looks to me like you completely misunderstood what SLOG is and why you would want one.

Taking writes that aren't synchronous and then forcing them to be synchronous in order to make them use the SLOG is only ever going to slow you down.

I bet that SSD you have isn't one that's suited to the job:


Without PLP, you're probably not winning anything by using a SLOG anyway, so just stop it.

If you want to benefit from your RAM, allow async writing. (and add more RAM)
 

mrcbns

Dabbler
Joined
Mar 24, 2022
Messages
10

Attachments

  • PXL_20220407_160022772.jpg
    PXL_20220407_160022772.jpg
    180.4 KB · Views: 147

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
So - for company use and...........
Gamer gear, no ECC, SMR Drives, Not enough memory (although does acknowledge that), "cache" drive (SLOG) with incorrect spec drive (probably), NIC is a "Qualcomm® Atheros® AR8171" - I am unsure of support - but suspect there isn't any.

So far everything is wrong - we don't know about the PSU or case (which is unlikley to be a big problem)

Those SMR disks are absolutely no use to you - throw them out / use them as doorstops / return them if you can
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
So - for company use and...........

So far everything is wrong - we don't know about the PSU or case (which is unlikley to be a big problem)

Those SMR disks are absolutely no use to you - throw them out / use them as doorstops / return them if you can
Correct on the last line, but I'd like to draw your attention to the words "testing" "lab" and "I know my build is not right but I tried to test it as is" - OP is definitely aware that the hardware right now isn't "business grade" but wants to ID if the problem is their hardware or a config/TrueNAS choice.

We've ID'd the problem pretty quick as the drives @mrcbns - are they by any chance repurposed/previously used drives? And do you have any other drives that aren't SMR?

If you don't, you might get some performance back by fully erasing the 5TB SMR drives - and giving them enough time to "clean their shingles"
 

mrcbns

Dabbler
Joined
Mar 24, 2022
Messages
10
Correct on the last line, but I'd like to draw your attention to the words "testing" "lab" and "I know my build is not right but I tried to test it as is" - OP is definitely aware that the hardware right now isn't "business grade" but wants to ID if the problem is their hardware or a config/TrueNAS choice.

We've ID'd the problem pretty quick as the drives @mrcbns - are they by any chance repurposed/previously used drives? And do you have any other drives that aren't SMR?

If you don't, you might get some performance back by fully erasing the 5TB SMR drives - and giving them enough time to "clean their shingles"
Absolutely we prioritize space over any kind of data redundancy because we can afford loosing 500Gb of unlabeled data, as this NAS is the centralized source of unlabeled and unprocessed data.
I thanks all experts replying on this thread with all kind of advise. It is so funny how you catalog my DIY NAS as instant garbage full of SMR pieces of shit. It is
But with all in mind, I learned my speed issues were about increasing RAM and buying proper disks. If I could test NIC or some Gigabit USB adapter that works right that could do the job for networking.

I am testing for learn in the process and test if my garbage gaming rig from 5 years ago could let my 5 people team load dataset from the same source with decent speeds instead sharing .zips
So I am trying to debug my garbage speeds from 3MB/s to 10MB. That would be enough for random write and read with our S3 API.

Data redundancy is not our question as we are investing in a proper NAS in the near future. This is just for testing and setting up a "proper" centralized dataset source of true for a 5 people team for the comming months.
Thanks for your suggestions, roasts are also welcome and funny
 

mrcbns

Dabbler
Joined
Mar 24, 2022
Messages
10
Correct on the last line, but I'd like to draw your attention to the words "testing" "lab" and "I know my build is not right but I tried to test it as is" - OP is definitely aware that the hardware right now isn't "business grade" but wants to ID if the problem is their hardware or a config/TrueNAS choice.

We've ID'd the problem pretty quick as the drives @mrcbns - are they by any chance repurposed/previously used drives? And do you have any other drives that aren't SMR?

If you don't, you might get some performance back by fully erasing the 5TB SMR drives - and giving them enough time to "clean their shingles"
I used them almost taking full space available for a short period of time. Then I formated them for the NAS. Performance was about the same from the first time.

I could check if any of other garbage disks laying arround are CMR and do a test this weekend. I can afford a pair of 4Tb CMR disks if they do the job, and then move them to the future proper NAS.

Could a 480Gb SSD for a single pool do the trick to test if the bottleneck is from other source?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I used them almost taking full space available for a short period of time. Then I formated them for the NAS. Performance was about the same from the first time.

I could check if any of other garbage disks laying arround are CMR and do a test this weekend. I can afford a pair of 4Tb CMR disks if they do the job, and then move them to the future proper NAS.

Could a 480Gb SSD for a single pool do the trick to test if the bottleneck is from other source?
Single SSD will be fine, if you pull it out of SLOG duties and put it as a single device just to check.

I'm not roasting you or your setup at all, just pointing out the issue in the shortest amount of time possible. Swap them for a couple of CMR/PMR devices and you'll likely see a lot better results from your testing. The problem is almost entirely on the SMR disks here, and HDD vendors haven't exactly been forthcoming in admitting that the SMR technology 1) exists 2) is in use and 3) quite frankly sucks for most RAID/multi-drive use cases.
 

mrcbns

Dabbler
Joined
Mar 24, 2022
Messages
10
Single SSD will be fine, if you pull it out of SLOG duties and put it as a single device just to check.

I'm not roasting you or your setup at all, just pointing out the issue in the shortest amount of time possible. Swap them for a couple of CMR/PMR devices and you'll likely see a lot better results from your testing. The problem is almost entirely on the SMR disks here, and HDD vendors haven't exactly been forthcoming in admitting that the SMR technology 1) exists 2) is in use and 3) quite frankly sucks for most RAID/multi-drive use cases.
I removed SSD ZIL and created a new pool with it for test performance, and I got the same results if not worse copying a small files dataset.
It starts with ~1MiB/s and decreasing over time. I paste several states of uploading so you get an idea.

...10_frame_101.jpg: 6.27 MiB / 102.26 MiB [===> ] 1.38 MiB/s


..._8_frame_355.jpg: 244.84 MiB / 300.69 MiB [===============================================> ] 572.64 KiB/s


..._9_frame_173.txt: 301.57 MiB / 302.17 MiB [==========================================================] 449.61 KiB/s

1649386463553.png


RAM pressure is not even as pushed like with HMR disks pool.

I tested it with the integrated Athheros AR8171 and TPlink UE300 USB 3.0 Gigabit interfaces.

Then I uploaded a large 1.5Gb ISO file to test sequential performance getting also pretty bad results

...LE-22.02.0.1.iso: 77.22 MiB / 1.50 GiB [===> ] 6.92 MiB/s
...LE-22.02.0.1.iso: 512.00 MiB / 1.50 GiB [====================> ] 5.67 MiB/s


And I get the same results testing both random and sequential tests at the same time.


So I checked my reverse proxy which is proxing to my NAS and CPU is weirdly high in all the test I do while upoading random or sequential files as you can see.

1649387387567.png



TDLR:
although my main problem is the HMR disks, it seems that the main issue is my reverse prox.
What is your opinion?
What can I do to improve this situation? Should I move the reverse proxy to the NAS directly? Thanks!
 
Last edited:

mrcbns

Dabbler
Joined
Mar 24, 2022
Messages
10
I looks like Raspberry pi 4 HTTPS TLS performance is crap, because there is no hardware support for encription. I will try to run nginx on the NAS and check results.
 

mrcbns

Dabbler
Joined
Mar 24, 2022
Messages
10
Well, I just nailed it. I painfully managed to run docker-compose -> portainer with the help of Truecharts docs.

So after reproducing my reverse proxy with minio containers y got x10 times the amount of upload speed I was expeting for us.

Results compared speaks for thhemselves:

...LE-22.02.0.1.iso: 899.41 MiB / 1.50 GiB [===================================> ] 34.92 MiB/s


Thanks you for all your help and your advices, I will also do a CMR disk test but I can tolerate this poorman's build if we get 35MiB/s 5Tb S3 server for 0 costs at all.

I will also enable another s3 domain for insecure but faster s3 endpoint tied to the 480Gb single SSD. It is bullshit but It work for us because we just upload datasets from slow desktop disk, and then It is the source for our entire team!

We will move to real NAS with CMR disks this year for sure, but I wanted to learn and experiment with unoptimal free available hardware in order to get the most juice out of it.

I am learning to setup a NAS thanks to this test.
 
Top