Examples 1
-
This is a simple script that gets as argument the URL of a web page and returns all the URLs inside that page:
get_urls.sh
#!/bin/bash
# Get all the URLs inside a given HTML page.
PAGE=$1
if [[ -z $PAGE ]]; then
echo "Usage: $0 <html-page>" >&2
exit 1
fi
wget -qO- "$PAGE" \
| grep -Eoi '<a [^>]+>' \
| grep -Eo 'href="?([^\"]+)"?' \
| grep -v 'mailto:' \
| sed -e 's/"//g' -e 's/href=//'vim get_urls.sh
./get_urls.sh
url=http://linuxcommand.org/
./get_urls.sh $url
Let's see how it works:
wget -qO- $url
wget -qO- $url | grep -Eoi '<a [^>]+>'
The option
-E
is for extended regexp syntax,-o
is for displaying only the matching part, and-i
is for case insensitive. We are extracting all the anchor tags.wget -qO- $url | grep -Eoi '<a [^>]+>' | grep -Eo 'href="?([^\"]+)"?'
Extracting the attribute
href
.wget -qO- $url \
| grep -Eoi '<a [^>]+>' \
| grep -Eoi 'href="?([^\"]+)"?' \
| grep -v 'mailto:' \
| sed -e 's/"//g' -e 's/href=//' -
This is a simple script that gets as argument the URL of a web page and returns a list of the 100 most frequently used words inside it:
get_words.sh
#!/bin/bash
# Return a list of the 100 most frequently used words inside a given page.
PAGE=$1
if [[ -z $PAGE ]]; then
echo "Usage: $0 <html-page>" >&2
exit 1
fi
wget -q -O- "$PAGE" \
| tr "\n" ' ' \
| sed -e 's/<[^>]*>/ /g' \
| sed -e 's/&[^;]*;/ /g' \
| tr -cs A-Za-z\' '\n' \
| tr A-Z a-z \
| sort \
| uniq -c \
| sort -k1,1nr -k2 \
| sed 100q \
| sed -E 's/^ +//' \
| cut -d' ' -f2vim get_words.sh
./get_words.sh
url=https://en.wikipedia.org/wiki/Linux
./get_words.sh $url
./get_words.sh $url | less
./get_words.sh $url | wc -l
wget -qO- $url \
| tr "\n" ' ' \
| sed -e 's/<[^>]*>/ /g' \
| sed -e 's/&[^;]*;/ /g' \
| tr -cs A-Za-z\' '\n' \
| tr A-Z a-z \
| lesswget -qO- $url \
| tr "\n" ' ' \
| sed -e 's/<[^>]*>/ /g' \
| sed -e 's/&[^;]*;/ /g' \
| tr -cs A-Za-z\' '\n' \
| tr A-Z a-z \
| sort \
| uniq -c \
| lesswget -qO- $url \
| tr "\n" ' ' \
| sed -e 's/<[^>]*>/ /g' \
| sed -e 's/&[^;]*;/ /g' \
| tr -cs A-Za-z\' '\n' \
| tr A-Z a-z \
| sort \
| uniq -c \
| sort -k1,1nr -k2 \
| sed 100q \
| less
Download lesson15/part1.cast
Loading asciinema cast...