4. More regex examples
-
Suppose that we are solving a crossword puzzle and we need a five letter word whose third letter is "j" and last letter is "r". Let's try to use
grepand regex to solve this.Fist of all make sure that we have a dictionary of words installed:
sudo apt install wbritishls /usr/share/dict/less /usr/share/dict/wordscat /usr/share/dict/words | wc -lNow try this:
grep -i '^..j.r$' /usr/share/dict/wordsThe option
-iis used to ignore the case (uppercase, lowercase).The regex pattern
'^..j.r$'will match lines that contain exactly 5 letters, where the third letter isjand the last one isr. -
Let's say that we want to check a phone number for validity and we consider a phone number to be valid if it is in the form
(nnn) nnn-nnnnor in the formnnn nnn-nnnnwherenis a digit. We can do it like this:echo "(555) 123-4567" | \
grep -E '^\(?[0-9][0-9][0-9]\)? [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]$'echo "555 123-4567" | \
grep -E '^\(?[0-9][0-9][0-9]\)? [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]$'echo "AAA 123-4567" | \
grep -E '^\(?[0-9][0-9][0-9]\)? [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]$'Since we are using the option
-E(for extended), we have to escape the parentheses\(and\)so that they are not interpreted as metacharacters.If we use basic regular expressions (without
-E), then we don't need to escape the parentheses, but in this case we will have to escape the question marks (\?) so that they are interpreted as metacharacters:echo "(555) 123-4567" | \
grep '^(\?[0-9][0-9][0-9])\? [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]$'The question mark as a metacharacter means that the parentheses before it can be zero or one time.
-
Using the metachars
{}we can express the number of required matches. For example:echo "(555) 123-4567" | \
grep -E '^\(?[0-9]{3}\)? [0-9]{3}-[0-9]{4}$'The expression
{3}matches if the preceding element occurs exactly 3 times.We could also replace
?by{0,1}, or{,1}:echo "(555) 123-4567" | \
grep -E '^\({0,1}[0-9]{3}\){,1} [0-9]{3}-[0-9]{4}$'echo "555 123-4567" | \
grep -E '^\({0,1}[0-9]{3}\){,1} [0-9]{3}-[0-9]{4}$'In general,
{n,m}matches if the preceding element occurs at leastntimes, but no more thanmtimes. These are also valid:{n,}(at leastntimes), and{,m}(at mostmtimes). -
Similar to
?which is equivalent to{0,1}, there is also*which is equivalent to{0,}(zero or more occurrences), and+which is equivalent to{1,}(one or more, at least one occurrence):Let's say that we want to check if a string is a sentence. This means that it starts with an uppercase letter, then contains any number of upper and lowercase letters and spaces, and finally ends with a period. We could do it like this:
echo "This works." | grep -E '[A-Z][A-Za-z ]*\.'echo "This Works." | grep -E '[A-Z][A-Za-z ]*\.'echo "this does not" | grep -E '[A-Z][A-Za-z ]*\.'Or like this:
echo "This works." | grep -E '[[:upper:]][[:upper:][:lower:] ]*\.'Note: In all these cases we have to escape the period (
\.) so that it matches itself instead of any character. -
Here is a regular expression that will only match lines consisting of groups of one or more alphabetic characters separated by single spaces:
echo "This that" | grep -E '^([[:alpha:]]+ ?)+$'echo "a b c" | grep -E '^([[:alpha:]]+ ?)+$'echo "a b c" | grep -E '^([[:alpha:]]+ ?)+$'Does not match because there are two consecutive spaces.
echo "a b 9" | grep -E '^([[:alpha:]]+ ?)+$'Does not match because there is a non-alphabetic character.
-
Let's create a list of random phone numbers for testings:
echo $RANDOMecho $RANDOMecho ${RANDOM:0:3}for i in {1..10}; do \
echo "${RANDOM:0:3} ${RANDOM:0:3}-${RANDOM:0:4}" >> phonelist.txt; \
donecat phonelist.txtfor i in {1..100}; do \
echo "${RANDOM:0:3} ${RANDOM:0:3}-${RANDOM:0:4}" >> phonelist.txt; \
doneless phonelist.txtcat phonelist.txt | wc -lYou can see that some of the phone numbers are malformed. We can display those that are malformed like this:
grep -Ev '^[0-9]{3} [0-9]{3}-[0-9]{4}$' phonelist.txtThe option
-vmakes an inverse match, which means thatgrepdisplays only the lines that do not match the given pattern. -
Regular expressions can be used with many commands, not just with
grep.For example let's use them with
findto find the files that contain bad characters in their name (like spaces, punctuation marks, etc):touch "bad file name!"ls -lfind . -regex '.*[^-_./0-9a-zA-Z].*'Different from
grep,findexpects the pattern to match the whole filename, that's why we are appending and prepending.*to the pattern.We can use regular expressions with
locatelike this:locate --regex 'bin/(bz|gz|zip)'We can also use them with
less:less phonelist.txtWe can press
/and write a regular expression, andlesswill find and highlight the matching lines. For example:/^[0-9]{3} [0-9]{3}-[0-9]{4}$The invalid lines will not be highlighted and will be easy to spot.
Regular expressions can also be used with
zgreplike this:cd /usr/share/man/man1zgrep -El 'regex|regular expression' *.gzIt will find man pages that contain either "regex" or "regular expression". As we can see, regular expressions show up in a lot of programs.