Ict-innovation/LPI/103.7

103.7 Using Regular Expressions

edit

Candidates should be able to manipulate files and text data using regular expressions. This objective includes creating simple regular expressions containing several notational elements. It also includes using regular expression tools to perform searches through a filesystem or file content.


Key Knowledge Areas

  • Create simple regular expressions containing several notational elements.
  • Use regular expression tools to perform searches through a filesystem or file content.


Overview

edit

Finding a word or multiple words in a text is achieved using grep, fgrep or egrep. The keywords used during a search are a combination of letters called regular expressions. Regular expressions are recognised by many other applications such as sed, and vi.

Regular Expressions

Traditional Regular Expressions (regex)

A regular expression is a sequence of characters (or atoms) used to match a pattern. Characters are either constants (treated literally) or metacharacters.

Table1: Main metacharacters
Characters
Search Match
\<KEY Words beginning with ‘KEY’
WORD\> Words ending with ‘WORD’
^ Beginning of a line
$ End of a line
[ Range ] Range of ASCII characters enclosed
[^c ] Not the character ‘c’
\[ Interpret character ‘[‘ literally
“ca*t” Strings containing ‘c’ followed by no 'a' or any number of the letter 'a' followed by a 't'
“.” Match any single character


Extended regex:

The main eregex’s are: +,?,() and |

Table2: List of main eregex
Characters
Search Match
"A1|A2|A3" Strings containing ‘A1’ or ‘A2’ or ‘A3’
"ca+t" Strings containing a 'ca' followed by any number of the letter 'a' followed by a 't'
"ca?t" Strings containing ‘c’ followed by no 'a' or exactly one 'a' followed by a 't'

The grep family

edit

The grep utility supports regular expressions regex such as those listed in Table1.

Working with basic grep

Syntax for grep:

grep PATTERN FILE

Options for grep include:

grep
Main Options
-c count the number of lines matching PATTERN
-f obtain PATTERN from a file
-i ignore case sensitivity
-n Include the line number of matching lines
-v output all lines except those containing PATTERN
-w Select lines only if the pattern matches a whole word.


For example list all non blank lines in /etc/lilo.conf:

$ grep –v “^$” /etc/lilo.conf


egrep

The egrep tool supports extended regular expressions eregex such as those listed in Table2.

The egrep utility will handle any modern regular expressions. It can also search for several keywords if they are entered at the command line, separated by the vertical bar character.

For example:

$ egrep 'linux|^image' /etc/lilo.conf


fgrep

fgrep stands for fast grep and fgrep interprets strings literally (no regex or eregex support). The fgrep utility does not recognise the special meaning of the regular expressions.

For example:

$ fgrep 'cat*' FILE

will only match words containing ‘cat*’. The main improvement came from fgrep’s ability to search from a list of keywords entered line by line in a file, say LIST. The syntax would be

$ fgrep –f LIST FILE


The Stream Editor - sed

sed performs automatic, non-interactive editing of files. It is often used in scripts to search and replace patterns in text. It supports most regular expressions.

Syntax for sed:

sed [options] 'command' [INPUTFILE]

The input file is optional since sed also works on file redirections and pipes. Here are a few examples assuming we are working on a file called MODIF.

Delete all commented lines:

$ sed '/^#/ d ' MODIF

Notice that the search pattern is between the double slashes.

Substitute /dev/hda1 by /dev/sdb3:

$ sed 's/\/dev\/hda1/\/dev\/sdb3/g' MODIF

The s in the command stands for ‘substitute’. The g stands for “globally” and forces the substitution to take place throughout each line. You can also specify which line numbers the substitutions should occur on, either using line numbers or regular expression match.

If the line contains the keyword KEY then substitute ‘:’ with ‘;’ globally:

$ sed '/KEY/ s/:/;/g' MODIF


More Advanced sed

You can issue several commands each starting with –e at the command line. For example, (1) delete all blank lines then (2) substitute ‘OLD’ by ‘NEW’ in the file MODIF

$ sed –e '/^$/ d’ -e ‘s/OLD/NEW/g' MODIF

These commands can also be written to a file, say COMMANDS. Then each line is interpreted as a new command to execute (no quotes are needed).

An example COMMANDS file
1 s/old/new/
/keyword/ s/old/new/g
23,25 d

The syntax to use this COMMANDS file is:

sed -f COMMANDS MODIF

This is much more compact than a very long command line !

Summary of options for sed
Command line flags
-e Execute the following command
-f Read commands from a file
-n Do not printout unedited lines
sed commands
d Delete an entire line
r Read a file and append to output
s Substitute
w Write output to a file



Used files, terms and utilities:

  • grep
  • egrep
  • fgrep
  • sed
  • regex(7)


Previous Chapter | Next Chapter