Find huge files in linux
Switching to SSD’s for better performance and compromising on disk space can be a tough decision to make. Though nowadays with the prices of SSD’s going down, it is something that a lot of people are consciously doing.
However having spoilt by past experiences of having ample disk space when you switch from a 500GB (or 1TB) hard disk to an, albeit super fast, 128GB SSD, you will inevitably run into situations where you need to clean out large files taking space on your disk. Something I had run into myself about a year back when I made the switch myself on my macbook. Oddly enough this problem turned out to be significant enough that I had to seek out a way to quickly find out which were the top biggest files on my file system so I could get rid of them / move to an external hard disk.
Most linux systems ship with a nifty little utility du which displays disk
usage statistics. It’s actually very good and fast, however by itself it
didn’t quite meet my requirements where I really wanted to see the top
few biggest files. So I fiddled with it in conjunction with a few other common
linux utilities and I came up with this little shell script I call findhuge
which, as the name suggests, helps me find huge files:
usage() {
  echo "$(basename "$0") [-h] [ -d dir] [-t type] [-c count]
	 -h			Show this help text
	 -d dir			Directory to search in 			Default - '.'
	 -t type		[f/d] for file / directory		Default - 'f'
	 -c count		Number of results			Default - 10" 1>&2
}
_DIR='.'
_TYPE='f'
_COUNT=10
while getopts ":h:d:t:c:" opts; do
  case "${opts}" in
    d) _DIR=${OPTARG};;
    t) _TYPE=${OPTARG};;
    c) _COUNT=${OPTARG};;
    h|*) usage; exit 1;;
  esac
done
find "$_DIR" -type "$_TYPE" -print0 | du -h --files0-from=- | sort -hr | head -"$_COUNT"
This script allows me to search the top _COUNT results (defaults to 10) of
_TYPE (defaults to ‘f’, for files) in the _DIR (defaults to the current
directory). Which gives me exactly what I need.
The shell script uses find to find files recursively in a directory and then
passes that through du to get their sizes in a human readable format, then
through sort to sort the results by size in reverse order and head to show
only the top results from that list. Neat!