What is the best way to run a OCR Tool such as OCRmyPDF?

FreeMinded

Cadet
Joined
Dec 4, 2023
Messages
4
I have been a happy user of TrueNAS SCALE for over a year now and still have a lot to learn...
Currently I'm looking for the best/recommended way to run a OCR Tool such as OCRmyPDF on TrueNAS Scale. As far a I understand installing packages with apt is not supported. What is the way to go? Docker? Any OCR Tool that is recommended on SCALE?

My use case is the following
1. Scanner sends PDF to a SMB share on the SCALE
2. A bash script is run every 5 minutes and looks for new PDFs and
3. OCRmyPDF (or similiar) does the OCRd
4. File is uploaded through WebDAV to an external Nextcloud and then deleted

I have everything working fine except for the OCR part where all googling did not yield any potential solution yet.

Any help is appreciated!
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
You'll need to setup a container, which, on balance, is substantially more convoluted a topic on Scale. I'll leave the details to someone else...
 
Joined
Oct 22, 2019
Messages
3,641
My use case is the following
1. Scanner sends PDF to a SMB share on the SCALE
2. A bash script is run every 5 minutes and looks for new PDFs and
3. OCRmyPDF (or similiar) does the OCRd
4. File is uploaded through WebDAV to an external Nextcloud and then deleted


Is it possible to do this from the client, and simply use TrueNAS as a "holding area" via SMB? Or would this client PC not be up 24/7, which makes the automation portion unfeasible?
 

FreeMinded

Cadet
Joined
Dec 4, 2023
Messages
4
Is it possible to do this from the client, and simply use TrueNAS as a "holding area" via SMB? Or would this client PC not be up 24/7, which makes the automation portion unfeasible?
Yes, the TrueNAS is the only system available 24/7 to do this job. So it there or not happening.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
While I'm not familiar with OCRmyPDF, if you can run this from the command line in SCALE, for example "ocrmypdf sourcefilename destinationfilename" and any parameters required, then you can write a script to do all that you are asking and it would be fairly simple. I would recommend that you also add a step to verify the file was in fact transferred to Nextcloud before deleting it on the SMB share.

My recommendation: Find out if you can enter command line instructions to make every step of the process to happen. Do not worry about checking for new files, just focus on the commands to make an OCR version of the file, and transfer it to Nextcloud, and lastly to verify the file made it to Nextcloud. If you can do that, the script is 90% done. All that is needed next is to check for any files in the SMB share and then push those to the commands you have verified to work. If the application you need cannot run directly from TrueNAS SCALE, then you need to create a container. That is not my area of expertise yet with SCALE, but that is probably the first thing you need to do. Keep it all simple and do not overthink the task.

Well time for me to go to work, I will check on this thread in about 10 hours when I return home.
 

FreeMinded

Cadet
Joined
Dec 4, 2023
Messages
4
While I'm not familiar with OCRmyPDF, if you can run this from the command line in SCALE, for example "ocrmypdf sourcefilename destinationfilename" and any parameters required, then you can write a script to do all that you are asking and it would be fairly simple. I would recommend that you also add a step to verify the file was in fact transferred to Nextcloud before deleting it on the SMB share.

My recommendation: Find out if you can enter command line instructions to make every step of the process to happen. Do not worry about checking for new files, just focus on the commands to make an OCR version of the file, and transfer it to Nextcloud, and lastly to verify the file made it to Nextcloud. If you can do that, the script is 90% done. All that is needed next is to check for any files in the SMB share and then push those to the commands you have verified to work. If the application you need cannot run directly from TrueNAS SCALE, then you need to create a container. That is not my area of expertise yet with SCALE, but that is probably the first thing you need to do. Keep it all simple and do not overthink the task.

Well time for me to go to work, I will check on this thread in about 10 hours when I return home.
Thanks for your reply. ocrmypdf is not available on the TrueNAS command line. My question is how to make it available since installing it with apt is not recommended/supported.
 

somethingweird

Contributor
Joined
Jan 27, 2022
Messages
183
My question is how to make it available since installing it with apt is not recommended/supported.

Create a VM on truenas - (linux/freebsd), install OCRmyPDF, mount NAS, create CRON job... I think that it?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Create a VM on truenas - (linux/freebsd), install OCRmyPDF, mount NAS, create CRON job... I think that it?
Exactly. @FreeMinded you first need to get the app on TrueNAS, and that means a VM/Container/Jail (No Jails in SCALE). Once it's installed then if you can run it via a CLI within the VM/Container/Jail, then you can create a script to run it. That is the way I know how to keep it self-contained in TrueNAS.

I made an assumption you know how to run OCRmyPDF from the CLI, if you do not then you could be facing a lot of work. Some programs do not run from the CLI and must be GUI. I am not familiar with the program you selected but I did just look at it and it says it's a scriptable tool, so you are in luck.

Just create a Debian VM/Docker and install the tool. Sounds simple. You may need to learn how to do that but it's part of the fun learning new things. The User Guide should be able to help out.

Good luck and try to enjoy it.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Here is another option for you and if your ability to use containers becomes too difficult, you could simplify it by using a RPI (Raspberry PI), load Linux and then OCRmyPDF. Have it run all the time to look for new files and convert them, then move them. This would keep this part of the project isolated from the TrueNAS server and you could easily use it on any server you desired. It does mean another piece of hardware I will admit but it is a clean option and requires very little power consumption. But if you can make the containers work for you, that is of course the best route.

Just something to think about.
 

somethingweird

Contributor
Joined
Jan 27, 2022
Messages
183

not_here_73

Cadet
Joined
Dec 5, 2021
Messages
7
I've been in a spot where I needed to figure out OCR on a system like TrueNAS SCALE, and while I'm not familiar with OCRmyPDF specifically on SCALE, Docker seems like a solid route to explore. It allows you to containerize the application and avoid messing with the underlying system packages. For your OCR needs, especially with sensitive documents, it's crucial to use a tool that's reliable and secure. While my situation was more focused on identity document verification, which required high accuracy and security due to the sensitive nature of the data, I found that solutions like ID Analyzer offer advanced OCR capabilities that could be relevant in scenarios requiring high precision and data security, ensuring that personal information is accurately captured and processed. For detailed OCR needs and identity verification, checking out Identity Verification could provide the accuracy and security you're looking for.
 
Last edited:
Top