1. See above. No RAID, you need a LSI HBA card flashed to IT mode
I'd also use 4x32 GB RAM instead of 8x16 so you have room to expand. The price should be the same.
No idea why 10GbE Copper RJ45 and SFP+. I suggest you get a switch with a pair of 10GbE SFP+ slots as well as a bunch of 1GbE ports. Ubiquiti and Mikrotik make a whole bunch that are inexpensive and performant. Then use the other SFP+ slot to daisy-chain additional, similar switches to fill up the office. You will never outstrip the 10GbE network re: capacity until you have added your fourth HDD VDEV, maybe.
2. Doubt dedup makes sense. You are only planning on reading in the data once, right? Dedup has other applications. That said, I think you should look into sVDEV drives or consider a meta-data-only L2ARC. That should speed up searches a lot. Of the two, I'd prefer sVDEV but a L2ARC can also be configured to be metadata-only and persistent. Over time, it will get "hot" and serve you well. Read up on the costs and benefits of both, then decide which would work better for you.
5. See 2) L2ARC might make sense for your application. However, your client may also benefit from a mirrored SSD pool if you plan on hosting the search indices there for all to share (think Foxtrot search and the like). So the index goes to the SSD pool and holds all the keywords while the data itself resides on the HDDs.
Think carefully what you will use to OCR the data. I have been very happy with scansnap ix500 but it's for home / office use, not industrial. It produces a scanned image + OCR text (in a PDF), which usually works.
1, i dont have an option to purchase/select an hba card but surely will buy an external one .
but why ? , whats wrong with a raid controller ?
1.a, why it mode is so important ?
2, sure u'r right , the ram can be change , i dont mind adding abit more for them .
3, could you please explain what did you mean here?
"You will never outstrip the 10GbE network re: capacity until you have added your fourth HDD VDEV, maybe"
4, why adding lots of 1gbs ports ?
i dont plan to attach work stations to the system .
heres how the process will go
scanning work-stations will send the scanned file to the ocr server , when the ocr is done it will put original file (scanned file) and the ocr file into the storage server (as a hot folder for the web app) , the webapp/big data will fetch all the files from the hot folder and move it (the files) accordingly on criteria /rules that we set (and index them into the db)
thats my solution.
so to picture it .
scanning:
workstation 1gb > switch 10gb > ocr station 10gb > switch 10gb> storage server 10gb> switch 10gb> web app 10gb > storage server 10gb
users accessing via web app:
user workstation 100M-1gb> switch 10gb> webapp 10gb> switch 10gb > storage 10gb
i dont see why 1gb ports will help me there.
i picked those cards just incase something goes wrong , or incase i want to contact the storage and the wepapp server directly .
5 , well , in short after scanning it will be archive on the storage ones, depending on the users if they will look for it "search it" or not.
on second thought i dont think i will be using dedup , since i dont think i will let the users have access to it , that way they wont be able to copy / duplicate the files .
6, the webapp/big data will have its own server with ssds as raid 1 , the files itself will be in the storage while the indexed text/data (which weight nothing) will be on the webapp/big data .
7, about the ocr app , i already tested and choosen one ("industrial")
1: The controller is a no-go - go for HBA instead of RAID. If you can afford it I would go with the 4215R CPU for the higher clock - SMB is still single-threaded if I'm not mistaken. And get as much RAM as the customer will pay for - it will be used for ARC, and if you go with deduplication, it will also be used for DDT.
2: Deduplication performance on this hardware is questionable - there's some posts regarding how powerfull hw you need for it to really shine. It's one of those features that sounds cool on paper, but seldom works out as imagined, without really powerfull hw.
3: Depends on where you're located, and how hard the local market has been hit by shortages.
4: Expanding capacity in TrueNAS is quite easy. You can go two ways: replace drives, or add drives. To add more drives you need an HBA with external connectivity (in the face of a whole new server it's cheap to put in at once), and then you just add a JBOD chassis.
5: please explain what you think when you write the word "cache" - it's often mistaken. Is it L2ARC or SLOG (which isn't a cache, but a lot mistakenly think it is)
1, not that this is 2 cpus of 4210R Processor 10-Core 2.4GHz in total there is going to be 20cores and 40 threads .
2, i changed my mind i wont be using deduplication sounds like unnecessary in my case as i wont let the users to access the drive
3, im looking for on-prem warranty which is super crucial , if you have some worldwide stores please share them with me .
4, replace drives ? (do you mean all the drives or only one)
it does not work like raid ? as all drives as to be the same size ?
5, "in the face of a whole new server it's cheap to put in at once"
what did you mean ?
6, the ssd are for l2arc