1. sort
-
Let's try and compare these commands:
du -s /usr/share/* | less
du -s /usr/share/* | sort | less
du -s /usr/share/* | sort -r | less
du -s /usr/share/* | sort -nr | less
du -s /usr/share/* | sort -nr | head
The command
du
gets the size (disk usage) of the files and directories of/usr/share
, andhead
filters the top 10 results.Then we try to sort them with
sort
andsort -r
(reverse), but it does not seem to work as expected (sorting results by the size). This is becausesort
by default sorts the first column alphabetically, so2
is bigger than10
(because2
comes after1
on the character set).With the option
-n
we tell sort to do a numerical sort. So, the last command returns the top 10 biggest files and directories on/usr/share
. -
This example works because the numerical values happen to be on the first column of the output. What if we want to sort a list based on another column? For example the result of
ls -l
:ls -l /usr/bin | head
Ignoring for the moment that
ls
can sort its results by size, we could usesort
to sort them like this:ls -l /usr/bin | sort -nr -k 5 | head
The option
-k5
tellssort
to use the fifth field as the key for sorting. By the way,ls
like most of the commands, separates the fields of its output by a TAB. -
For testing we are going to use the file distros.txt, which is like a history of some Linux distributions (containing their versions and release dates).
wget https://linux-cli.fs.al/examples/lesson07/distros.txt
cat distros.txtcat -A distros.txt
The option
-A
makes it show any special characters. The tab character is represented by^I
, and the$
shows the end of line. -
Let's try to sort it:
sort distros.txt
The result is almost correct, but Fedora version numbers are not in the correct order (since
1
comes before5
in the character set).To fix this we are going to sort on multiple keys. We want an alphabetic sort on the first field and a numeric sort on the second field:
sort --key=1,1 --key=2n distros.txt
sort -k 1,1 -k 2n distros.txt
sort -k1,1 -k2n distros.txt
Notice that if we don't use a range of fields (like
1,1
, which means start at field 1 and end at field 1), it is not going to work as expected:sort -k 1 -k 2n distros.txt
This is because in this case it starts at field 1 and goes up to the end of the line, ignoring thus the second key.
The modifier
n
stands for numerical sort. Other modifiers arer
for reverse,b
for ignore blanks, etc. -
Suppose that we want to sort the list in reverse chronological order (by release date). We can do it like this:
sort -k 3.7nbr -k 3.1nbr -k 3.4nbr distros.txt
The
--key
option allows specification of offsets within fields. So3.7
means start sorting from the 7-th character of the 3-rd field, which is the year. The modifiern
makes it a numerical sort,r
does reverse sorting, and withb
we are suppressing any leading spaces of the third field.In a similar way, the second sort key
3.1
sorts by the month, and the third key3.4
sorts by day. -
Some files don't use tabs and spaces as delimiters, for example the file
/etc/passwd
:head /etc/passwd
In this case we can use the option
-t
to define the field separator character. For example to sort/etc/passwd
on the seventh field (the account's default shell), we could do this:sort -t ':' -k 7 /etc/passwd | head