Is there a way to scan through my storage pool and find all duplicate files? I know that there are a bunch its just and it could be on any dataset, in nested folders.
Well, how do you want to find them? By file name? By file size? By date? The easiest way is to use some tool on a client machine to scan through the shares for duplicates. Otherwise you could script something up yourself to run on the server itself.
Here is a script I put together back in the FreeNAS 7 days. FreeBSD still has the "diff" command in it so I believe this should still work. I use to run it against the wife's 300GB of pictures to found dups. The script makes a log file and you have to go back in manually and delete the files. If you decide to use it and something doesn't work just let me know and I'll take a look at it.
Code:
#!/bin/bash
function usage {
echo -e "\014Usage:\r\n"
echo -e "\tThis script will make a log of duplicate files located in the root and sub-directories of the giving directory."
echo -e "\tYou must specify the directory you are wanting to check and also specify a temporary directory to storage needed temporary files."
echo -e "\tPlease use \"\"'s around the directory names to insure proper handling on directories with spaces in the name.\r\n\r\n"
echo -e "\t\tProper syntax: ${0} \"Directory to Check\" \"Temp Directory\"\r\n"
echo -e "\t\tExample: ${0} \"/mnt/Pictures\" \"/tmp\"\r\n\r\n"
echo -e "\tYour log of duplicates will be located in the temporary directory under the filename \"dups.list\"\r\n\r\n"
}
if [ ! "$#" = "2" ] || [ "$1" = "--help" ] || [ "$1" = "-h" ]; then
usage
exit 1
fi
DIRLIST="${2}/dir.list"
DUPLOG="${2}/dups.list"
COUNT=1
WORKINGCOUNT=0
DUPCOUNT=0
echo "Started scan at: `date +"%r on %D"`" >> ${DUPLOG}
echo -e "\014Building directory database (${2}/dir.list)..."
find "${1}" -type f | sort > ${DIRLIST}
echo "Storing database into memory..."
while read LINE
do
FILEARRAY[$COUNT]=${LINE}
COUNT=`expr ${COUNT} + 1`
done < ${DIRLIST}
TOTAL=${#FILEARRAY[@]}
COUNT=1
for FILE in "${FILEARRAY[@]}"
do
CHECKING="${FILEARRAY[${COUNT}]}"
WORKINGCOUNT=`expr ${COUNT} + 1`
for WORKFILE in "${FILEARRAY[@]:${WORKINGCOUNT}}"
do
COMPARE="${FILEARRAY[${WORKINGCOUNT}]}"
echo -en "\014Processing file ${COUNT} of ${TOTAL} - Currently on ${WORKINGCOUNT}\r\nNumber of duplicates found: ${DUPCOUNT}"
WORKINGCOUNT=`expr ${WORKINGCOUNT} + 1`
RETURNLEVEL=`diff "${CHECKING}" "${COMPARE}"`
if [ "${RETURNLEVEL}" == "" ]; then
#echo "!*! MATCH FOUND !*! - ${CHECKING} and ${COMPARE} are the same."
echo "${CHECKING} matches ${COMPARE}" >> ${DUPLOG}
DUPCOUNT=`expr ${DUPCOUNT} + 1`
#else
#echo "No match between ${CHECKING} and ${COMPARE}."
fi
done
COUNT=`expr ${COUNT} + 1`
done
unset FILEARRAY
rm ${DIRLIST}
echo "Finished scan at: `date +"%r on %D"`" >> ${DUPLOG}
echo
exit 0
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.