30 April 2021
Unix shells use certain characters as wildcards. These so-called globbing characters are similar to but different from regular expressions. Globbing is used purely for file name expansion, while regular expressions have a much broader scope. You can use globbing with any utility that is used to work with files, such as ls
, find
, mv
and rm
.
An asterisk matches zero or more characters. For instance, if you want to list all .php files in a directory then you can run ls *php
:
$ ls -1 *php config.php feed.php index.php install.php
The question mark matches a single character. This is obviously less common, but it can come in handy:
$ ls -1 backup_?.zip backup_1.zip backup_2.zip backup_3.zip backup_4.zip
You can define a range in square brackets. For instance, let’s imaging that you have a directory with four backup files and that you want to delete all but the most recent file:
$ ls -1 2020111* 20201110_2145_backup.zip 20201111_2145_backup.zip 20201112_2145_backup.zip 20201113_2145_backup.zip
You can use the range 2020111[0-2] to match the first three files:
$ rm -f 2020111[0-2]* $ ls -1 2020111* 20201113_2145_backup.zip
The command rm -f 2020111[0-2]*
matches the first three files, but not the file that starts with 20201113. We are therefore left with just the most recent backup.
Ranges can be negated using an exclamation mark inside the square brackets. For instance rm -f 2020111[!0-2]*
removes only files that don’t match the pattern (in this case the file 20201113_2145_backup.zip. And, a range doesn’t have to be numeric – you can match a range of letters as well. For instance, the range [a-c]
matches a, b and c.
There are a few special character classes you can use. The classes are mainly useful to avoid “ugly” ranges. For instance, you can use [a-zA-Z]
to match any alphabetical character in either lower or upper case. A more elegant way to archive the same result is to use the [[:alpha:]]
class instead.
Classes are always a keyword inside double square brackets and colons. The table below shows the most common ones.
Class | Matches | Equivalent |
---|---|---|
[[:alpha:]] | Alphabetical characters | [a-zA-Z] |
[[:alnum:]] | Alphabetical characters and integers | [a-zA-Z0-9] |
[[:blank:]] | Space or tab characters | [ \t] |
[[:digit:]] | Integers | [0-9] |
[[:lower:]] | Lower case alphabetical characters | [a-z] |
[[:upper:]] | Upper case alphabetical characters | [A-Z] |
It is worth noting that the asterisk and question mark have a different meaning in regular expressions. In a “regex” the asterisk matches one or more instances of the preceding character, and the question mark matches zero or one instance of the preceding character. The dot character (.
) is used to match one instance of any character, which can then be combined with an asterisk (.*
) to match any number of any characters (including zero characters).