Key Takeaways
- The actual size of a file, which is the number of bytes that make up the file, and the effective size on the hard disk, which is the number of file system blocks necessary to store it, are different due to the allocation of disk space in blocks.
- The du command can be used to check the size of files, directories, and the total disk space used by the current directory and subdirectories.
- Run “du -h” to see a list of files and folders in a human-readable format.
When you use the Linux du
command, you obtain both the actual disk usage and the true size of a file or directory. We’ll explain why these values aren’t the same.
Why are Actual Disk Usage and True Size Different?
The size of a file and the space it occupies on your hard drive are rarely the same. Disk space is allocated in blocks. If a file is smaller than a block, an entire block is still allocated to it because the file system doesn’t have a smaller unit of real estate to use.
Unless a file’s size is an exact multiple of blocks, the space it uses on the hard drive must always be rounded up to the next whole block. For example, if a file is larger than two blocks but smaller than three, it still takes three blocks of space to store it.
Two measurements are used in relation to file size. The first is the actual size of the file, which is the number of bytes of content that make up the file. The second is the effective size of the file on the hard disk. This is the number of file system blocks necessary to store that file.
How to Check a File’s Size
Let’s look at a simple example. We’ll redirect a single character into a file to create a small file:
echo "1" > geek.txt
Now, we’ll use the long format listing, ls
, to look at the file length:
ls -l geek.txt
The length is the numeric value that follows the dave dave
entries, which is two bytes. Why is it two bytes when we only sent one character to the file? Let’s take a look at what’s happening inside the file.
We’ll use the hexdump
command, which will give us an exact byte count and allow us to “see” non-printing characters as hexadecimal values. We’ll also use the -C
(canonical) option to force the output to show hexadecimal values in the body of the output, as well as their alphanumeric character equivalents:
hexdump -C geek.txt
The output shows us that, beginning at offset 00000000 in the file, there’s a byte that contains a hexadecimal value of 31, and a one that contains a hexadecimal value of 0A. The right-hand portion of the output depicts these values as alphanumeric characters, wherever possible.
The hexadecimal value of 31 is used to represent the digit one. The hexadecimal value of 0A is used to represent the Line Feed character, which cannot be shown as an alphanumeric character, so it’s shown as a period (.) instead. The Line Feed character is added by echo
. By default, echo
starts a new line after it displays the text it needs to write to the terminal window.
That tallies with the output from ls
and agrees with the file length of two bytes.
Now, we’ll use the du
command to look at the file size:
du geek.txt
It says the size is four, but four of what?
There Are Blocks, and Then There Are Blocks
When du
reports file sizes in blocks, the size it uses depends on several factors. You can specify which block size it should use on the command line. If you don’t force du
to use a particular block size, it follows a set of rules to decide which one to use.
First, it checks the following environment variables:
- DU_BLOCK_SIZE
- BLOCK_SIZE
- BLOCKSIZE
If any of these exist, the block size is set, and du
stops checking. If none are set, du
defaults to a block size of 1,024 bytes. Unless, that is, an environment variable called POSIXLY_CORRECT
is set. If that’s the case, du
defaults to a block size of 512 bytes.
So, how do we find out which one is in use? You can check each environment variable to work it out, but there’s a quicker way. Let’s compare the results to the block size the file system uses instead.
To discover the block size the file system uses, we’ll use the tune2fs
program. We’ll then use the -l
(list superblock) option, pipe the output through grep
, and then print lines that contain the word “Block.”
In this example, we’ll look at the file system on the first partition of the first hard drive, sda1
, and we’ll need to use sudo
:
sudo tune2fs -l /dev/sda1 | grep Block
The file system block size is 4,096 bytes. If we divide that by the result we got from du
(four), it shows the du
default block size is 1,024 bytes. We now know several important things.
First, we know the smallest amount of file system real estate that can be devoted to storing a file is 4,096 bytes. This means even our tiny, two-byte file is taking up 4 KB of hard drive space.
The second thing to keep in mind is applications dedicated to reporting on hard drive and file system statistics, such as du
, ls
, and tune2fs
, can have different notions of what “block” means. The tune2fs
application reports true file system block sizes, while ls
and du
can be configured or forced to use other block sizes. Those block sizes are not intended to relate to the file system block size; they’re just “chunks” those commands use in their output.
Finally, other than using different block sizes, the answers from du
and tune2fs
convey the same meaning. The tune2fs
result was one block of 4,096 bytes, and the du
result was four blocks of 1,024 bytes.
Using du to Check File Size
With no command line parameters or options, du
lists the total disk space the current directory and all subdirectories are using.
Let’s take a look at an example:
du
The size is reported in the default block size of 1,024 bytes per block. The entire subdirectory tree is traversed.
Using du
on a Different Directory
If you want du
to report on a different directory than the current one, you can pass the path to the directory on the command line:
du ~/.cach/evolution/
Using du
on a Specific File
If you want du
to report on a specific file, pass the path to that file on the command line. You can also pass a shell pattern to a select a group of files, such as *.txt
:
du ~/.bash_aliases
Reporting on Files in Directories
To have du
report on the files in the current directory and subdirectories, use the -a
(all files) option:
du -a
For each directory, the size of each file is reported, as well as a total for each directory.
Limiting Directory Tree Depth
You can tell du
to list the directory tree to a certain depth. To do so, use the -d
(max depth) option and provide a depth value as a parameter. Note that all subdirectories are scanned and used to calculate the reported totals, but they’re not all listed. To set a maximum directory depth of one level, use this command:
du -d 1
The output lists the total size of that subdirectory in the current directory and also provides a total for each one.
To list directories one level deeper, use this command:
du -d 2
Setting the Block Size
You can use the block
option to set a block size for du
for the current operation. To use a block size of one byte, use the following command to get the exact sizes of the directories and files:
du --block=1
If you want to use a block size of one megabyte, you can use the -m
(megabyte) option, which is the same as --block=1M
:
du -m
If you want the sizes reported in the most appropriate block size according to the disk space used by the directories and files, use the -h
(human-readable) option:
du -h
To see the apparent size of the file rather than the amount of hard drive space used to store the file, use the --apparent-size
option:
du --apparent-size
You can combine this with the -a
(all) option to see the apparent size of each file:
du --apparent-size -a
Each file is listed, along with its apparent size.
Displaying Only Totals
If you want du
to report only the total for the directory, use the -s
(summarize) option. You can also combine this with other options, such as the -h
(human-readable) option:
du -h -s
Here, we’ll use it with the --apparent-size
option:
du --apparent-size -s
Displaying Modification Times
To see the creation or last modification time and date, use the --time
option:
du --time -d 2
Strange Results?
If you see strange results from du
, especially when you cross-reference sizes to the output from other commands, it’s usually due to the different block sizes to which different commands can be set or those to which they default. It could also be due to the differences between real file sizes and the disk space required to store them.
If you need to match the output of other commands, experiment with the --block
option in du
.