Faster Searching from Mac Spotlight or Finder of Freenas Files

kapitainsky

Dabbler
Joined
Sep 30, 2022
Messages
46
Long story short:
-There are no plans for native Spotlight support (only spotlight backed enabled in latest versions of TrueNAS Core or Scale)
-This outlines what you should do https://wiki.samba.org/index.php/Spotlight_with_Elasticsearch_Backend

1) Prepare a Linux VM for fscrawler + ElasticSearch
2) Follow fscrawler docs and point it to your storage (via SSH for example)
3) Point fscrawler to your ElasticSearch install
4) Let it run and verify thats it's working via ElasticSearch web ui
5) Point your Samba Service under TrueNAS to your ElasticSearch instance
6) Verify with mdfind

Voilà!

Short story made even shorter - I think now only issue is with your point 5 - otherwise it is easy and works either in jail or VM.
 

phradr

Dabbler
Joined
Sep 27, 2022
Messages
49
OK could not wait for weekend and did all tests today.

Debian with Samba 4.15.9 compiled from source (the same version TrueNAS is using) works perfectly - so it is not problem with this particular Samba version.

Maybe problem is with FreeBSD? I realised that their pkg Samba (the latest available is 4.13.17) does not have Spotlight enabled so compiled from ports with Spotlight ON. All works perfectly - I can search files by their content. It is not FreeBSD issue.

These results more and more point into problem with TrueNAS Samba - they use their own fork.

Errors sequence we see in TrueNAS smb.log when ES response is received starts with:

Code:
[2022/09/29 15:56:50.745574,  0, pid=93402, effective(1000, 1000), real(0, 0)] ../../libcli/http/http.c:199(http_parse_response_line)
  http_parse_response_line: Error parsing header


and this is where it gets real interesting. FreeBSD samba port patches exactly this file - http.c - and exactly in place where this error is thrown:


in vanilla Samba:

Code:
    n = sscanf(line, "%m[^:]: %m[^\r\n]\r\n", &key, &value);
    if (n != 2) {
        DEBUG(0, ("%s: Error parsing header '%s'\n", __func__, line));
        status = HTTP_DATA_CORRUPTED;
        goto error;
    }


but in FreeBSD:

Code:
#ifdef FREEBSD
    int s0, s1, s2, s3; s0 = s1 = s2 = s3 = 0;
    n = sscanf(line, "%n%*[^/]%n/%c.%c %d %n%*[^\r\n]%n\r\n",
           &s0, &s1, &major, &minor, &code, &s2, &s3);

    if(n == 3) {
        protocol = calloc(sizeof(char), s1-s0+1);
        msg = calloc(sizeof(char), s3-s2+1);

        n = sscanf(line, "%[^/]/%c.%c %d %[^\r\n]\r\n",
            protocol, &major, &minor, &code, msg);
    }
#else
     n = sscanf(line, "%m[^/]/%c.%c %d %m[^\r\n]\r\n",
            &protocol, &major, &minor, &code, &msg);
#endif
     if (n != 2) {
         DEBUG(0, ("%s: Error parsing header '%s'\n", __func__, line));
         status = HTTP_DATA_CORRUPTED;
         goto error;
    }


Could not find out TrueNAS Core Samba source code (I only started using TrueNAS and FreeBSD last week and still learning) to see what is there but I think that now we have enough info to create meaningful bug report. For this your traces would be extremely helpful - unfortunately I am not sure how to get them.

Hopefully some TrueNAS dev can look into it and figure out what is wrong with their Samba 4.15.9.

ES search over Samba really makes difference - works very well. Of course then it will be yet another story how to optimise it etc. Already can see that it needs good amount of RAM - I tried with 30k files and used like 5GB.

That, my friend, is the real issue! Great work and thanks for your analyzation! That strengthens my supposition :)

What I didn't think about - until you made it come up in my mind - is, that TrueNAS uses its own Samba fork.
The sources are available on GitHub (TrueNAS: LINK, Samba fork: LINK - attention: the branch is not quiet clear - at least for me)

Though you pointed to
Code:
    n = sscanf(line, "%m[^:]: %m[^\r\n]\r\n", &key, &value);
    if (n != 2) {
        DEBUG(0, ("%s: Error parsing header '%s'\n", __func__, line));
        status = HTTP_DATA_CORRUPTED;
        goto error;
    }

this is not perfectly correct. The real issues is in line 199 of "our sources" (within our running Samba stack). That looks a bit different and is only a few lines underneath your code (LINK, SCALE-v4-17-stable branch, =default):
Code:
165/**
166* Parses the first line of a HTTP response
167*/
168static bool http_parse_response_line(struct http_read_response_state *state)
169{
170 bool status = true;
...
198 if (n != 5) {
199 DEBUG(0, ("%s: Error parsing header\n", __func__));
200 status = false;
201 goto error;
202 }
...
233}

Anyway... As I don't know how the ES response should look like for (TrueNAS') Samba, I could not patch that on my own (pullrequest for ex.).

But you now have pointed to the exact position where we stand and we can derive, that this issue has to be resolved within TrueNAS - and not within the "vanilla" Samba sources (though it could also be possible, if TrueNAS pulls the master brancher from time to time...

So, any suggestions where to open a ticket? maybe at TrueNAS AND Samba?[/CODE]
 

phradr

Dabbler
Joined
Sep 27, 2022
Messages
49
@kapitainsky : What do you mean by that?
Short story made even shorter - I think now only issue is with your point 5 - otherwise it is easy and works either in jail or VM.

Targeting the ES API from the actual TrueNAS Samba fork settings is no big deal. Go to Services > SMB (pencil) > Auxiliary Parameters and activate:
Code:
spotlight backend = elasticsearch
elasticsearch:address = NAT IP (eg. 172.16.0.22) or DHCP IP (eg. 10.0.0.112)
elasticsearch:port = 9200
elasticsearch:index = scans (This is a need though it is marked as optional in Samba documentation!)

And in your share go to: Sharing > Windows Shares (SMB) > (sharename's) menu "Edit" > Auxiliary Parameters and set:
Code:
spotlight = yes


That's it.


- or did I get you wrong?
 
Last edited:

kapitainsky

Dabbler
Joined
Sep 30, 2022
Messages
46
What I didn't think about - until you made it come up in my mind - is, that TrueNAS uses its own Samba fork.
The sources are available on GitHub (TrueNAS: LINK, Samba fork: LINK - attention: the branch is not quiet clear - at least for me)

I got there but could not find Core 13 - Samba 4.15.9 code


this is not perfectly correct. The real issues is in line 199 of "our sources" (within our running Samba stack). That looks a bit different and is only a few lines underneath your code (LINK, SCALE-v4-17-stable branch, =default):

good catch - it was late for me when writing - included wrong part. Neverthless FreeBSD uses the same patched logic to calculate n variable in place you pointed too.


Code:
+#ifdef FREEBSD
+    int s0, s1, s2, s3; s0 = s1 = s2 = s3 = 0;
+    n = sscanf(line, "%n%*[^/]%n/%c.%c %d %n%*[^\r\n]%n\r\n",
+           &s0, &s1, &major, &minor, &code, &s2, &s3);
+
+    if(n == 3) {
+        protocol = calloc(sizeof(char), s1-s0+1);
+        msg = calloc(sizeof(char), s3-s2+1);
+
+        n = sscanf(line, "%[^/]/%c.%c %d %[^\r\n]\r\n",
+            protocol, &major, &minor, &code, msg);
+    }
+#else
     n = sscanf(line, "%m[^/]/%c.%c %d %m[^\r\n]\r\n",
            &protocol, &major, &minor, &code, &msg);
+#endif
 
-    DEBUG(11, ("%s: Header parsed(%i): protocol->%s, major->%c, minor->%c, "
-           "code->%d, message->%s\n", __func__, n, protocol, major, minor,
-           code, msg));
-
     if (n != 5) {
         DEBUG(0, ("%s: Error parsing header\n",    __func__));
         status = false;
         goto error;
     }
 
Last edited:

kapitainsky

Dabbler
Joined
Sep 30, 2022
Messages
46
@kapitainsk: What do you mean by that?


Targeting the ES API from the actual TrueNAS Samba fork settings is no big deal. Go to Services > SMB (pencil) > Auxiliary Parameters and activate:
Code:
spotlight backend = elasticsearch
elasticsearch:address = NAT IP (eg. 172.16.0.22) or DHCP IP (eg. 10.0.0.112)
elasticsearch:port = 9200
elasticsearch:index = scans (This is a need though it is marked as optional in Samba documentation!)

And in your share go to: Sharing > Windows Shares (SMB) > (sharename's) menu "Edit" > Auxiliary Parameters and set:
Code:
spotlight = yes


That's it.


- or did I get you wrong?

Ok maybe was not clear. What I meant was that point 5 does not work. Of course pointing Samba to ES is easy - but we want it to work.
 

kapitainsky

Dabbler
Joined
Sep 30, 2022
Messages
46
spotlight backend = elasticsearch elasticsearch:address = NAT IP (eg. 172.16.0.22) or DHCP IP (eg. 10.0.0.112) elasticsearch:port = 9200 elasticsearch:index = scans (This is a need though it is marked as optional in Samba documentation!)

I always use:

Code:
spotlight backend = elasticsearch
elasticsearch:address = 172.16.0.2
elasticsearch:port = 9200
elasticsearch:ignore unknown attribute = yes
elasticsearch:ignore unknown type = yes
elasticsearch:use tls = no


Re elasticsearch:index = scans - it might be required indeed in some more complex ES setup. One day when everything works we can start another thread to share best practices and findings:)
 
Last edited:

kapitainsky

Dabbler
Joined
Sep 30, 2022
Messages
46
So, any suggestions where to open a ticket? maybe at TrueNAS AND Samba?

They have fixed some issues in Spotlight implementation - looks like it would fix @seanm issue if TrueNAS ports it to their fork


but otherwise I do not think there is any issue with Samba and ES from our perspective.

I would do TrueNAS ticket only -> and then see what some dev comes with
 

phradr

Dabbler
Joined
Sep 27, 2022
Messages
49
I got there but could not find Core 13 - Samba 4.15.9 code
Ah okay, well, I suspect they use the same source for CORE and SCALE.
Neverthless FreeBSD uses the same patched logic to calculate n variable
Damn right! And you might already have found the solution :)
Ok maybe was not clear. What I meant was that point 5 does not work. Of course pointing Samba to ES is easy - but we want it to work
Oh sorry, didn't want to be impolite.
One day when everything works we can start another thread to share best practices and findings
Great idea! I would participate :D
looks like it would fix @seanm issue if TrueNAS ports it to their fork
yes, could be. I don't have access to his mentioned ticket (LINK) - maybe @seanm could refer that?
 

kapitainsky

Dabbler
Joined
Sep 30, 2022
Messages
46
FYI:


some progress but it wont fix ES issues IMHO yet. They rather go oposite direction by setting:

rpc_daemon:mdssd = disabled
rpc_server:mdssvc = disabled


and disabling Spotlight.

I hope it can be overwritten by auxillary parameters. [confirmed - it can]

PS. @phradr - somebody explained me why some jirra tickers are private - does not really help wiht community work -
"Issues are locked while users have private info uploaded to ticket."
 
Last edited:

seanm

Guru
Joined
Jun 11, 2018
Messages
570
yes, could be. I don't have access to his mentioned ticket (LINK) - maybe @seanm could refer that?
Oh, I didn't realize those links are not publicly visible without logging into jira. I didn't mark the ticket private when I created it. Anyway, it doesn't say much beyond what I wrote here and their fix is in the github link @kapitainsky linked above.

I went searching in jira for tickets with the word "spotlight" and there is this one:

 

kapitainsky

Dabbler
Joined
Sep 30, 2022
Messages
46
Oh, I didn't realize those links are not publicly visible without logging into jira. I didn't mark the ticket private when I created it. Anyway, it doesn't say much beyond what I wrote here and their fix is in the github link @kapitainsky linked above.

I went searching in jira for tickets with the word "spotlight" and there is this one:


as I was told "Issues are locked while users have private info uploaded to ticket"

hahaha - so it means they decide which ticket is secret and which one is not

together with your comment it tells me that there are tones of tickets in jirra we can't see - so only iXsystems knows what they are working on with community kept in the dark.

ohh well - open community might have many meanings
 

seanm

Guru
Joined
Jun 11, 2018
Messages
570
as I was told "Issues are locked while users have private info uploaded to ticket"
I didn't add any private info to the ticket. Hiding such tickets make sense, but shouldn't be the case for my ticket.

hahaha - so it means they decide which ticket is secret and which one is not
It could just be a bug... Jira is well known to be a big steaming mess, from what I've heard. :) Can you not see any tickets without an account, or only some?

together with your comment it tells me that there are tones of tickets in jirra we can't see - so only iXsystems knows what they are working on with community kept in the dark.
I'm not with iXsystems. You can create an account on their jira for free, just requires giving them an email address.
 

kapitainsky

Dabbler
Joined
Sep 30, 2022
Messages
46
I didn't add any private info to the ticket. Hiding such tickets make sense, but shouldn't be the case for my ticket.


It could just be a bug... Jira is well known to be a big steaming mess, from what I've heard. :) Can you not see any tickets without an account, or only some?


I'm not with iXsystems. You can create an account on their jira for free, just requires giving them an email address.

I have account - still can not see your ticket. And not only yours. Experienced it already few times - very annoying.
 

seanm

Guru
Joined
Jun 11, 2018
Messages
570
I have account - still can not see your ticket. And not only yours. Experienced it already few times - very annoying.

I'm pretty sure that's a bug, and not some deliberate limitation. I'm able to see tickets created by many different people. I don't recall ever being unable to access a ticket. And I'm nobody, just a user. This thread can hopefully help you:

 

phradr

Dabbler
Joined
Sep 27, 2022
Messages
49
Oh, I didn't realize those links are not publicly visible without logging into jira. I didn't mark the ticket private when I created it. Anyway, it doesn't say much beyond what I wrote here and their fix is in the github link @kapitainsky linked above.

I went searching in jira for tickets with the word "spotlight" and there is this one:

Right today I installed SCALE on my esxi to play around with that.

It‘s different but with at least the same functionality PLUS utilizing docker instead of jails (obviously that makes sense as scale is based on debian and not bsd).

Maybe I choose the easy way and just move to scale…

Furthermore I know debian way better than freebsd
 

kapitainsky

Dabbler
Joined
Sep 30, 2022
Messages
46
Right today I installed SCALE on my esxi to play around with that.

It‘s different but with at least the same functionality PLUS utilizing docker instead of jails (obviously that makes sense as scale is based on debian and not bsd).

Maybe I choose the easy way and just move to scale…

Furthermore I know debian way better than freebsd

looking forward for your ES/Spotlight integration results.
 

kapitainsky

Dabbler
Joined
Sep 30, 2022
Messages
46
@phradr - are you going to open TrueNAS ticket re ES/Samba? I feel we should keep it rollling otherwise nothing happens. If not could I ask you to provide traces from your investigation? then I will open it.

I also noticed that Spotlight will be OFF in Scale

 

seanm

Guru
Joined
Jun 11, 2018
Messages
570
@phradr - are you going to open TrueNAS ticket re ES/Samba?

Maybe https://ixsystems.atlassian.net/browse/NAS-107345 is already the ticket we need, and info should just be added to it?

Since you can't access jira, the ticket is:

---
Title: Add diskover / elasticsearch plugin to scale

Description: Cluster-wide search / indexing will probably be a good feature to have. Samba in 4.12 and later has ability to use elasticsearch as a backend for spotlight queries. There has also been some upstream work to support MS-WSP through same.
---

(Not sure why it refers only to Scale though.)
 

phradr

Dabbler
Joined
Sep 27, 2022
Messages
49
are you going to open TrueNAS ticket re ES/Samba?
I'll take care of that tomorrow as I still need to regenerate the debug logs and had a busy day.
Maybe https://ixsystems.atlassian.net/browse/NAS-107345 is already the ticket we need, and info should just be added to it?

Since you can't access jira, the ticket is:

---
Title: Add diskover / elasticsearch plugin to scale

Description: Cluster-wide search / indexing will probably be a good feature to have. Samba in 4.12 and later has ability to use elasticsearch as a backend for spotlight queries. There has also been some upstream work to support MS-WSP through same.
---

(Not sure why it refers only to Scale though.)
Thanks @seanm . As far as I undertand the ticket it targets to add a general ES functionality to SCALE. I doubt it will impact CORE as the basis of both differ extremely.

What you can't know: I wrote a Mod in the morning to have a look at this thread and maybe move it to the TrueNAS forum as it is in FreeNAS (legacy) actually. I also asked him whom to address regarding the ticket. If I don't get an answer until tomorrow night I'll create a ticket at iXsystem AND Samba, as it might be still a general issue.
 

kapitainsky

Dabbler
Joined
Sep 30, 2022
Messages
46
Maybe https://ixsystems.atlassian.net/browse/NAS-107345 is already the ticket we need, and info should just be added to it?

Since you can't access jira, the ticket is:

---
Title: Add diskover / elasticsearch plugin to scale

Description: Cluster-wide search / indexing will probably be a good feature to have. Samba in 4.12 and later has ability to use elasticsearch as a backend for spotlight queries. There has also been some upstream work to support MS-WSP through same.
---

(Not sure why it refers only to Scale though.)

Looks like future story - elasticsearch plugin - so you just click,click and samba share can be searched. First Samba/ES integration has to work - which at the moment is not.
 
Top