After you’ve run a Linux server for a long time, it can be time to clean house and optimize your system. One of the easiest ways to get started is to search your system for large files and see where you can get rid of big stuff with the least impact. You might find that there are a few files on your system that are taking up a lot of space and aren’t useful anymore. Once you’ve cleared out the low-hanging fruit, you can move on to examining specific files or even entire categories of files like log files en masse to save even more space.
But let’s get started with the basics.
Finding Large Files in Linux
You might think that it’s easier to find large files if you have a GUI. But the Linux command-line interface is very suited to this kind of task. For example, if you want to find files above a certain size, you can use the following command:
find /path/to/search -type f -size +[size]
The parameters are self-explanatory. The “-type f” option means that you’re telling the find command to restrict itself only to files and not directories. Here’s what the command looks like in action:
In the above screenshot, I’ve isolated files larger than 100M. The first of these is the swap file which will be huge, so there’s not much I can do about it. But there are other files of interest that it would be worthwhile to examine to see if they’re necessary.
Since the Linux system I’m using is a test system for debugging and writing tutorials, it doesn’t have much going on. Most of the files here are important and can’t be deleted, with the possible exception of “pandoc”.
Getting More Information about the Files
While it’s useful to know which files are above a certain size, it’s even more useful if we could see the exact details of how large those files are. Luckily, we can combine the find command with the “du” command. The du command is used for estimating and displaying the sizes of files and directories. Since the find command just provides us with a list of files with no other information about them, we can add the du command to it like this:
find / -type f -size +100M -exec du -h {} \;
Here, we see the option called “-exec”. This parameter allows us to run commands that accept each line of output generated by the find command and perform operations on them. You can see above, that I’m using the “du” command for this. The “-h” parameter calculates the size of the file or directory and displays it in a human-readable format. The curly brackets “{}” are used as placeholders for each item in the list, and the slash (/;) ends the entire command.
Here’s what it looks like in practice:
As you can see, each of the entries from the “file” command now has a file size attached to it. The swap file is by far the largest of the group, and the others aren’t much above 100MB in comparison. Looking at the list, you can see that there’s a long tail of file sizes, and we can free up considerable amounts of disk space if we can only find an efficient way to isolate and group the sizes of the files.
Sorting the Results
Looking at the above screenshot, you can see that the files aren’t sorted in descending order of size. The second largest file – “pandoc” is towards the end of the list, while we’d like it to be in second place. This way, if you have a large list of files, you can focus your attention on the ones that matter most.
The command to sort the files by size can look like this:
find / -type f -size +100M -exec ls -lh {} \; | sort -k 5 -h
Here’s a screenshot of what the above command looks like on my test system:
As you can see, all the files are displayed in order of descending size. An unusual entry is the “kcore” file which appears to be 128 terabytes! The reason is that this file represents the system’s virtual addressable space, and isn’t an actual file on disk with a size like that. The presence of this file in this command is because we used “ls -lh” instead of “du” that we saw earlier.
The command works like this. It consists of three segments. Each of them does the following:
- The first segment gives us the list of files
- The second lists the file sizes
- The third sorts the output based on the “file size” column
The “ls -lh” command is a way to list files in a human-readable format, with the “h” parameter converting file sizes to the formats with which we’re familiar. The sort command sorts the output based on the 5th field, and then the “r” parameter reverses it so that we get everything in descending order.
Another option is to sort the output using the “du” command instead of “ls -lh”. This has the advantage of ignoring virtual files and gives us only the actual files and folders. In my opinion, the output is also cleaner than using “ls -lh”. Here’s the command:
find / -type f -size +100M -exec du -h {} + | sort -k 1 -h -r
As you can see, no virtual files this time.
Finding Specific Types of Large Files
Once you’ve cleared out any glaring issues, any further space savings will be harder to come by. One ripe area for the exploration of wasted space is isolating log files that have grown continuously over the years. Generally, Linux administrators manage log files by periodically archiving and gzipping them to save space. But even these can add up over the years. To make matters worse, Linux maintains plenty of log files for different services – for example, the Apache web server has its own set of log files.
Some of these log files might even belong to applications that you no longer use, in which case it’s safe to delete them. Using the tools we’ve explored, we can restrict our search just to log files above a certain size using a command like this:
find /var/log -type f -name "*.log" -size +10M
The above command will isolate log files over 10 megabytes. You can examine them individually to see if they’re important or not. You can also sort them using the tools we discussed above like this:
sudo find / -type f -name "*.log" -size +10M -exec du -h {} + | sort -k 1 -h -r
Here’s how it looks:
Because I use this server only for testing purposes, there aren’t any large log files, so I restricted my search to only those above 1 kilobyte. It’s not much!
Conclusion
As you can see, finding large files in Linux is easy. With a bit of extra work, you can get their exact sizes, sort them in descending order, and specifically target the types of files you want. Having tried these operations on a Windows system with a GUI, I can attest that it’s much more difficult, and using the command line is much more efficient. Hopefully, you found my examples useful!
I’m a NameHero team member, and an expert on WordPress and web hosting. I’ve been in this industry since 2008. I’ve also developed apps on Android and have written extensive tutorials on managing Linux servers. You can contact me on my website WP-Tweaks.com!
Leave a Reply