SOLVED Finding files with foreign filenames

Plato

Contributor
Joined
Mar 24, 2016
Messages
101
Hi,

I want to find files with foreign filenames in them, I'm using a regex to find such files with another tool but I don't know how to use them in find.

Here is what I use ( everything is in a file ):

Code:
*\[DE\]*
*\[JP\]*
*\[RU\]*
*\[FR\]*
*\[KR\]*
*\[PL\]*
*\[AU\]*
*\[IT\]*
*[\p{Han}]*
*[\p{Hiragana}]*
*[\p{Katakana}]*
*[\p{Cyrillic}]*
*[\p{Hangul}]*
*German*
*Japanese*
*Korean*
*Spanish*
*Italian*
*Polish*
*French*


Is there a way to find all files including these?
 

anmnz

Patron
Joined
Feb 17, 2018
Messages
286
I am a but puzzled by what you're trying to do. Those are not valid regular expressions... what is the tool you say you are using them with?

What do you consider to be a "foreign" filename???
 

Plato

Contributor
Joined
Mar 24, 2016
Messages
101
I didn't say those are valid regex strings.. I meant I'm looking for files with a similar pattern.. Let me explain it with sample filenames:

Code:
*\[DE\]* #....[DE]...
*\[JP\]* #....[JP]...
*\[RU\]* #....[RU]...
*\[FR\]* #....[FR]...
*\[KR\]* #....[KR]...
*\[PL\]* #....[PL]...
*\[AU\]* #....[AU]...
*\[IT\]* #....[IT]...
*[\p{Han}]*      #....{Kanji}...
*[\p{Hiragana}]* #....{Hiragana}...
*[\p{Katakana}]* #....{Katakana}...
*[\p{Cyrillic}]* #....{Cyrillic ( Russian etc)}...
*[\p{Hangul}]*   #....{Hangul ( Korean )}...
*German*    #....German...
*Japanese*  #....Japanese...
*Korean*    #....Korean...
*Spanish*   #....Spanish...
*Italian*   #....Italian...
*Polish*    #....Polish...
*French*    #....French...


Couldn't write characters themselves, just think about files including that character instead of curly braces...
 

Fredda

Guru
Joined
Jul 9, 2019
Messages
608
This is not exactly a FreeNAS question but more a shell or FreeBSD question, a quick google search shows e.g. this.
For FreeNAS you should probably go with pcregrep instead of a plain grep:
find . | pcregrep '[^\x00-\x7F]'
 
Last edited:

Plato

Contributor
Joined
Mar 24, 2016
Messages
101
Well, it didn't work.. That's most probably find command is different from gnu-find command. Installing fd-find and executing this:
fd '.*[\p{Han}\p{Katakana}\p{Hiragana}\p{Hangul}\p{Cyrillic}].*' worked for foreign characters
 

Fredda

Guru
Joined
Jul 9, 2019
Messages
608
Well, it didn't work.. That's most probably find command is different from gnu-find command
I highly doubt that a find . will behave differently on any Unix or Linux system.
Code:
# uname -r
11.2-STABLE

# find . | pcregrep '[^\x00-\x7F]'
./täst.txt
 

Plato

Contributor
Joined
Mar 24, 2016
Messages
101
It listed more than what I wanted. But I think the real problem is the ASCII limit. There are some characters bigger than x7F, and still have english titles. As I said I wanted to filter out files with those (foreign) characters in them. Executing it like this probably works:

find . | pcregrep '[^\x00-\xFF]'

Couldn't test them right now, because I removed the files with fd-find.
 
Top