Duplicate Files

Status
Not open for further replies.

alxxer

Dabbler
Joined
May 7, 2013
Messages
38
Is there a way to scan through my storage pool and find all duplicate files? I know that there are a bunch its just and it could be on any dataset, in nested folders.

Thanks
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, how do you want to find them? By file name? By file size? By date? The easiest way is to use some tool on a client machine to scan through the shares for duplicates. Otherwise you could script something up yourself to run on the server itself.
 

alxxer

Dabbler
Joined
May 7, 2013
Messages
38
I would like to find them by file name. Last time I ran a scan with a client machine it took forever, then crashed.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Joined
Aug 11, 2015
Messages
1
I would recommend you to use Duplicate Files Deleter. Its a pretty versatile tool, ideal for this type of a problem.
 
Joined
Jan 9, 2015
Messages
430
Here is a script I put together back in the FreeNAS 7 days. FreeBSD still has the "diff" command in it so I believe this should still work. I use to run it against the wife's 300GB of pictures to found dups. The script makes a log file and you have to go back in manually and delete the files. If you decide to use it and something doesn't work just let me know and I'll take a look at it.
Code:
#!/bin/bash

function usage {
  echo -e "\014Usage:\r\n"
  echo -e "\tThis script will make a log of duplicate files located in the root and sub-directories of the giving directory."
  echo -e "\tYou must specify the directory you are wanting to check and also specify a temporary directory to storage needed temporary files."
  echo -e "\tPlease use \"\"'s around the directory names to insure proper handling on directories with spaces in the name.\r\n\r\n"
  echo -e "\t\tProper syntax: ${0} \"Directory to Check\" \"Temp Directory\"\r\n"
  echo -e "\t\tExample: ${0} \"/mnt/Pictures\" \"/tmp\"\r\n\r\n"
  echo -e "\tYour log of duplicates will be located in the temporary directory under the filename \"dups.list\"\r\n\r\n"
}


if [ ! "$#" = "2" ] || [ "$1" = "--help" ] || [ "$1" = "-h" ]; then
  usage
  exit 1
fi

DIRLIST="${2}/dir.list"
DUPLOG="${2}/dups.list"
COUNT=1
WORKINGCOUNT=0
DUPCOUNT=0

echo "Started scan at: `date +"%r on %D"`" >> ${DUPLOG}

echo -e "\014Building directory database (${2}/dir.list)..."
find "${1}" -type f | sort > ${DIRLIST}

echo "Storing database into memory..."

while read LINE
  do
    FILEARRAY[$COUNT]=${LINE}
    COUNT=`expr ${COUNT} + 1`
  done < ${DIRLIST}
TOTAL=${#FILEARRAY[@]}
COUNT=1


for FILE in "${FILEARRAY[@]}"
do
  CHECKING="${FILEARRAY[${COUNT}]}"
  WORKINGCOUNT=`expr ${COUNT} + 1`
    for WORKFILE in "${FILEARRAY[@]:${WORKINGCOUNT}}"
      do
    
        COMPARE="${FILEARRAY[${WORKINGCOUNT}]}"
        echo -en "\014Processing file ${COUNT} of ${TOTAL} - Currently on ${WORKINGCOUNT}\r\nNumber of duplicates found: ${DUPCOUNT}"
        WORKINGCOUNT=`expr ${WORKINGCOUNT} + 1`
    
        RETURNLEVEL=`diff "${CHECKING}" "${COMPARE}"`

        if [ "${RETURNLEVEL}" == "" ]; then
          #echo "!*! MATCH FOUND !*! - ${CHECKING} and ${COMPARE} are the same."
          echo "${CHECKING} matches ${COMPARE}" >> ${DUPLOG}
          DUPCOUNT=`expr ${DUPCOUNT} + 1`
            #else
          #echo "No match between ${CHECKING} and ${COMPARE}."
        fi
      done
      COUNT=`expr ${COUNT} + 1`
done

unset FILEARRAY
rm ${DIRLIST}
echo "Finished scan at: `date +"%r on %D"`" >> ${DUPLOG}
echo
exit 0
 
Last edited:
Status
Not open for further replies.
Top