Elasticsearch for Full Text Search in NextCloud

cfcaballero

Dabbler
Joined
Nov 26, 2017
Messages
45
Hi all:

I finally got this working after a couple of false starts in the past. This post is NOT a step-by-step guide. It's a fresh recollection from my admittedly skimpy notes, but I think it hits all the major points. YMMV, etc.

I did this in a fresh install of the most recent NextCloud ("NC") plugin (v19.0.3) running on FreeNAS-11.3-U4.1 (hardware specs in my sig).

First, of course, go ahead and install the plugin and test that it's working at the IP address you specified. Depending on your networking setup and the related options you picked for the jail, remember that you may not be able to access the webserver from outside the jail until you edit the config file. As always, Google search is your friend, as is testing from inside the jail using curl.

The default credentials for the install will be located at /root/PLUGIN_INFO

Next, you need to install the NC apps for full-text search. Note that these don't actually perform the full text search, they just create the "scaffolding".

This article is dated in terms of version, and a different platform, but still gives all the major steps, you just replace them with the relevant steps for a FreeNAS/FreeBSD jail.
https://fribeiro.org/post/2018-02-07-nextcloud-full-text-elasticsearch/
This link was also useful, but you can skip logstash.
http://blog.dushin.net/2019/08/installing-elk-on-freenas-jail/

use "pkg install" to search and find elasticsearch (I recommend you install Kibana also to more easily manage Elastic). I wound up installing the following:
  • elasticsearch7-7.9.1
  • kibana7-7.9.1
  • openjdk8-8.265.01.1
Note, I assume you use the bash shell so you can easily search history, etc. So, you get into the plugin jail by using jls to get its ID number, then "jexec X bash". Of course, you first have to use pkg to install bash by using the iocage console command or the same command with tcsh instead of bash.

Now we run into the first major deviation from the links I posted and other instructions I found while searching. The command to run the ingestion plugin doesn't work because the local java JDK/JRE cannot be found. So, I used the locate command to find where my java runtime bin folder was, then used the following command:

export JAVA_HOME=/usr/local/openjdk8

Once that is settled, your command to add the ingestion plugin becomes:

/usr/local/lib/elasticsearch/bin/elasticsearch-plugin install ingest-attachment

Now, the next step is to add the files you want to be indexed as external files in NC. There is one CRITICAL step here that if you omit, will result in you waiting for an index to be generated but getting no search results, and it took me some serious Google foo for me to find the answer.

First, I recommend that you use the jail mount points mechanism to make the files visible to the jail. You will have to learn about file permissions and whatnot, but you should be learning that, IMO, if you are using jails/plugins anyway. Also, I think it's both safer and more performant than trying to have the jail mount the files using networking, as you don't have enter credentials into NC and don't incur network stack overhead.

The CRITICAL step is to enter the relevant (to your NC setup) user names and groups in the "Available for" field, even though the help text in the empty field states that leaving it blank makes the external files available to all users. That in itself is true, but it has the side effect of having the NC Full Text Search apps ignore the index results because the user and group fields are blank. This "Available for" field is present when you add the external storage under "Administration" Settings, not in the "Personal" settings. If you wish to make the mount point available to a single user only under Personal Settings, I imagine that will likely work given my experience, but I have not tested it.

Now, the final step is to start indexing the external files. Don't make the same mistake I first did and target your entire share at first! Find a subfolder of your mount point that contains a reasonable number of index-able files, and try and include some PDF and other content-heavy files to make sure the ingestion plugin is properly indexing the contents.

The commend to start indexing (again, which I found using locate):

sudo -u www php /usr/local/www/nextcloud/occ fulltextsearch:index

When you index your full set of file shares, I recommend you run this command in a detachable shell using screen or tmux.

That's all folks. I will try and answer any questions, but am busy, so don't expect much, and make sure and use Google searches first. The answers are usually out there if you are persistent (and make sure you look for results from using the same apps on other platforms). Good luck!
 
Last edited:

cfcaballero

Dabbler
Joined
Nov 26, 2017
Messages
45
Some additional info:

The NC indexer is REALLY SLOW. Glacial. So, if you have a large number of files, be prepared to wait days or weeks for that first index to complete. Elasticsearch itself doesn't crawl the filesystem and index the files. It just indexes whatever you send it via its API. So, it's that NC PHP command that is crawling (at a snail's pace) the filesystem, and feeding the names and contents of the file to Elasticsearch.

You can periodically reconnect to the shell running the index command and check progress, but NC doesn't seem to return any results until that first index is complete, which is frustrating. Kibana seems (to me, at least) to have a pretty steep learning curve, so you may want to check the intermediate results by calling Elasticsearch directly. You can do that from the command line of the jail with curl, like this:

curl "http://localhost:9200/freenas/_search?q=test&pretty" | less
 

cfcaballero

Dabbler
Joined
Nov 26, 2017
Messages
45
Some more learnings:

Use the touch command to create an empty ".noindex" file in any folders you don't want indexed. The indexing time is proportional to the number of files, so this saves you lots of time if you have lots of files you don't care about (source code, for example).

If you want to see which folders have the largest number of files included in all subfolders, use this command (bash):

for i in *; do echo $i: $(find "$i" -type f | wc -l); done

You will likely want to increase the RAM allocation for the various components involved (if you have it to spare! be careful not to overallocate and take away from the cache - search these forums). Again, Google is your friend, but the pages you find may have file locations that don't apply (as they are for different distros/packages), so here are the file locations.

/usr/local/etc/mysql/my.cnf
/usr/local/etc/php.ini
/usr/local/etc/elasticsearch/jvm.options

I have wound up resetting (clearing) the index and starting anew several times. The command for this is:

sudo -u www php /usr/local/www/nextcloud/occ --verbose fulltextsearch:reset

However, if you have large amounts of files (I have more than 300k) being indexed, I have had issues with the initial index stopping while the cron jobs are active. So, whichever method for running NextCloud cron jobs you used, disable it until the initial index is completed.

Here are links where you can find all the full-text search commands and other OCC commands:
https://github.com/nextcloud/fulltextsearch/wiki/Commands#fulltextsearchindex
https://docs.nextcloud.com/server/19/admin_manual/configuration_server/occ_command.html

HTH
 

cfcaballero

Dabbler
Joined
Nov 26, 2017
Messages
45
I just repeated the steps above for a new install of Nextcloud v22 plugin on TrueNAS-12.0-U4.1. Everything worked fine. Just be sure and pay attention to the output you get when installing the elasticsearch pkg as per the links above.
 
Last edited:

impestrator

Dabbler
Joined
Feb 10, 2022
Messages
26
Hi cfcaballero,

I've been struggling for weeks now with implementing Elasticsearch for my Nextcloud instance running on Truenas via jail. Fortunately I found your article here and tried to follow it to make it finally working, but I failed and I don´t know where to investigate.

When I typed in
Code:
occ fulltextsearch:test
I got the following output
Code:
.Testing your current setup:
Creating mocked content provider. ok
Testing mocked provider: get indexable documents. (2 items) ok
Loading search platform. (Elasticsearch) ok
Testing search platform. fail
In StaticNoPingConnectionPool.php line 64:

 No alive nodes found in your cluster


Then I checked the java version:
Code:
java -version
openjdk version "11.0.13" 2021-10-19
OpenJDK Runtime Environment (build 11.0.13+8-1)
OpenJDK 64-Bit Server VM (build 11.0.13+8-1, mixed mode)


Tried to locate java:
Code:
1-11.0.13+8.1:
/usr/local/openjdk11/bin/java


Then I tried:
Code:
export JAVA_HOME=/usr/local/openjdk11

But I got:

Code:
export: command not found


I really don´t know how to proceed here. Is there a way to troubleshoot? Any logs that could be interesting? I can imagine something is wrong with my Java setup.

Regards
Martin
 

cfcaballero

Dabbler
Joined
Nov 26, 2017
Messages
45
Hi @impestrator.

When I typed in
Code:
occ fulltextsearch:test
I got the following output
Code:
.Testing your current setup:
Creating mocked content provider. ok
Testing mocked provider: get indexable documents. (2 items) ok
Loading search platform. (Elasticsearch) ok
Testing search platform. fail
In StaticNoPingConnectionPool.php line 64:

 No alive nodes found in your cluster

I don't have time this weekend to poke around too much into my previously running instance. TBH, I have given up on this solution for indexing my NAS files and decided instead to run Recoll in a bhyve VM on the same system. Too much stuff wasn't getting indexed by the NC/ES solution, and I didn't feel like writing lots of scripts for logstash, etc.

My only thought besides the Java track you are on is to check that the Elasticsearch service is running and that you used the sysrc commands to make it start automatically when the jail starts, etc.

HTH.
 

impestrator

Dabbler
Joined
Feb 10, 2022
Messages
26
Hi @cfcaballero

I´ve checked if elastic search ist running
Code:
root@nextcloud:~ # service -e

/usr/local/etc/rc.d/elasticsearch


but to double check I restarted the service but I then I´ve got a lot of errors. The problem is... I cannot find the logs of my elastic instance to post them.

I understand completely why you´ve decided to index your files with another solution. But unfortunately I´m forced to use nc (at the moment) although I tried to get rid of that. Ideally the files get indexed by default on truenas filesystem layer and I´m able to search trough pdfs provided via SMB share...

I spent a lot of time to investigate in that, but I ended at Nextcloud. So maybe you have time to assist me...

Regards
Martin
 
Top