Duplicate Files

alxxer · Jun 30, 2014

Is there a way to scan through my storage pool and find all duplicate files? I know that there are a bunch its just and it could be on any dataset, in nested folders.

Thanks

cyberjock · Jun 30, 2014

Well, how do you want to find them? By file name? By file size? By date? The easiest way is to use some tool on a client machine to scan through the shares for duplicates. Otherwise you could script something up yourself to run on the server itself.

alxxer · Jun 30, 2014

I would like to find them by file name. Last time I ran a scan with a client machine it took forever, then crashed.

DrKK · Jul 1, 2014

https://forums.freebsd.org/viewtopic.php?t=10376

danielluke1984 · Aug 12, 2015

I would recommend you to use Duplicate Files Deleter. Its a pretty versatile tool, ideal for this type of a problem.

DifferentStrokes · Aug 12, 2015

Here is a script I put together back in the FreeNAS 7 days. FreeBSD still has the "diff" command in it so I believe this should still work. I use to run it against the wife's 300GB of pictures to found dups. The script makes a log file and you have to go back in manually and delete the files. If you decide to use it and something doesn't work just let me know and I'll take a look at it.

Code:

#!/bin/bash

function usage {
  echo -e "\014Usage:\r\n"
  echo -e "\tThis script will make a log of duplicate files located in the root and sub-directories of the giving directory."
  echo -e "\tYou must specify the directory you are wanting to check and also specify a temporary directory to storage needed temporary files."
  echo -e "\tPlease use \"\"'s around the directory names to insure proper handling on directories with spaces in the name.\r\n\r\n"
  echo -e "\t\tProper syntax: ${0} \"Directory to Check\" \"Temp Directory\"\r\n"
  echo -e "\t\tExample: ${0} \"/mnt/Pictures\" \"/tmp\"\r\n\r\n"
  echo -e "\tYour log of duplicates will be located in the temporary directory under the filename \"dups.list\"\r\n\r\n"
}


if [ ! "$#" = "2" ] || [ "$1" = "--help" ] || [ "$1" = "-h" ]; then
  usage
  exit 1
fi

DIRLIST="${2}/dir.list"
DUPLOG="${2}/dups.list"
COUNT=1
WORKINGCOUNT=0
DUPCOUNT=0

echo "Started scan at: `date +"%r on %D"`" >> ${DUPLOG}

echo -e "\014Building directory database (${2}/dir.list)..."
find "${1}" -type f | sort > ${DIRLIST}

echo "Storing database into memory..."

while read LINE
  do
    FILEARRAY[$COUNT]=${LINE}
    COUNT=`expr ${COUNT} + 1`
  done < ${DIRLIST}
TOTAL=${#FILEARRAY[@]}
COUNT=1


for FILE in "${FILEARRAY[@]}"
do
  CHECKING="${FILEARRAY[${COUNT}]}"
  WORKINGCOUNT=`expr ${COUNT} + 1`
    for WORKFILE in "${FILEARRAY[@]:${WORKINGCOUNT}}"
      do
    
        COMPARE="${FILEARRAY[${WORKINGCOUNT}]}"
        echo -en "\014Processing file ${COUNT} of ${TOTAL} - Currently on ${WORKINGCOUNT}\r\nNumber of duplicates found: ${DUPCOUNT}"
        WORKINGCOUNT=`expr ${WORKINGCOUNT} + 1`
    
        RETURNLEVEL=`diff "${CHECKING}" "${COMPARE}"`

        if [ "${RETURNLEVEL}" == "" ]; then
          #echo "!*! MATCH FOUND !*! - ${CHECKING} and ${COMPARE} are the same."
          echo "${CHECKING} matches ${COMPARE}" >> ${DUPLOG}
          DUPCOUNT=`expr ${DUPCOUNT} + 1`
            #else
          #echo "No match between ${CHECKING} and ${COMPARE}."
        fi
      done
      COUNT=`expr ${COUNT} + 1`
done

unset FILEARRAY
rm ${DIRLIST}
echo "Finished scan at: `date +"%r on %D"`" >> ${DUPLOG}
echo
exit 0

Important Announcement for the TrueNAS Community.

Duplicate Files

alxxer

Dabbler

cyberjock

Inactive Account

alxxer

Dabbler

DrKK

FreeNAS Generalissimo

danielluke1984

Cadet

DifferentStrokes

Patron

Similar threads

Important Announcement for the TrueNAS Community.

Duplicate Files

alxxer

Dabbler

cyberjock

Inactive Account

alxxer

Dabbler

DrKK

FreeNAS Generalissimo

danielluke1984

Cadet

DifferentStrokes

Patron

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Duplicate Files"

Similar threads