LPI Linux Certification/Search Text Files Using Regular Expressions

Detailed Objectives

edit

(LPIC-1 Version 5.0)

Weight: 2

Description:
Candidates should be able to manipulate files and text data using regular expressions. This objective includes creating simple regular expressions containing several notational elements as well as understanding the differences between basic and extended regular expressions. It also includes using regular expression tools to perform searches through a filesystem or file content.

Key Knowledge Areas:

  • Create simple regular expressions containing several notational elements.
  • Understand the differences between basic and extended regular expressions.
  • Understand the concepts of special characters, character classes, quantifiers and anchors.
  • Use regular expression tools to perform searches through a filesystem or file content.
  • Use regular expressions to delete, change and substitute text.

The following is a partial list of the used files, terms and utilities:

  • grep
  • egrep
  • fgrep
  • sed
  • regex(7)

Pattern matching

edit

There are two kinds of pattern matching:

  • Wildcards (File Name Generation)
  • Regexp (Regular Expression)

Wildcard characters are mainly applied when they are used in the current directory or subdirectories. When wildcard characters *, ?, [ - ], ~, and ! are used in regexp they no longer generate filenames.

Some of the utilities that use regexp are:

  • grep, egrep
  • vi
  • more
  • sed
  • Perl

Limited regexp search patterns used by all utilities able to use regexp.

  • Any 1 char . Ab.a Abla or Abca
  • 1 char set [ ] Ab[sd]a Absa or Abda only
  • 1 char range [ - ] Ab[a-z]a Abaa or Abba or ...
  • Not in set [^ ] Ab[^0-9]a Abaa or Abba or ...
  • 0 or more * Ab*a Absala or Aba or ...
  • Begin line ^ ^Aba Line starts>Aba
  • End line $ Aba$ Aba<line ends
  • Literal \ Aba\$ Aba$

Example:

Ab[0-3]s
^Ab\^bA
[01]bin$
^..\\
[^zZ]oro

Combinations of limited regexp combination used by all utilities using regexp.

  • Any string .* Ab.*a Abrahma or Abaa or ...
  • String from [ ]* th[aersti]* There or This or ...
  • Multi range [ - - ] Ab[0-2][a-c]a Ab0aa or Ab1aa or ...
  • Match \ \\ \\[a-zA-Z]* \Beethoven

Examples:

Ab[0-3][a-z]s
...$
^[01]\^2
[0-9][a-z] \$
[a-zA-Z]*
^[^c-zC-Z]*
^[a-zA-Z0-9]$

Modifier patterns Replace strings matched by regexp patterns

  • Match m \{m\} b[0-9]\{3\} b911
  • One or more \{m,\} b[0-9]\{2,\} b52
  • Up to n \{m,n\} b[0-9]\{2,4\} b1234
  • Beginning of word \< \<wh where
  • End of word \> [0-9]\> bin01

grep

edit

To find text in a file, use grep.

grep [options] [string] [files]

It is best to quote the string to prevent misinterpretation.

Common options:

  • -i: Ignore case
  • -E: Extended, use regular expressions
  • -l: List filename only if at least one matches
  • -c: Display only count of matched lines
  • -n: Also display line number
  • -v: Must not match.

Examples:

grep host /etc/*.conf
grep -l '\<mai' /usr/include/*.h
grep -n toto /etc/group
grep -vc root /etc/passwd
grep '^user' /etc/passwd
grep '[rR].*' /etc/passwd
grep '\<[rR].*' /etc/passwd

To apply a command on a stream, use sed.

sed [address1][,address2][!]command[options] [files...]

The program sed will apply a command from address1 to address2 in a file. The address1 and address2 format is a regular expression.

The sed program is a noninteractive editing tool.

Examples:

sed '1,3s/aa/bb/g' file               # Replace in file from lines 1 to 3 'aa' with 'bb'.
sed '/here/,$d' file                  # Delete line from here to the end.
sed '/here/d' file                    # Delete lines including the word 'here'.
sed '1,/xxx/p' file                   # Print lines 1 to xxx.
sed '/ll/,/ff/!s/maison/house/g' file # In file replace words 'maison' with 'house' excluding lines from ll to ff.

Exercises

edit
  1. Process your bookmarks.html file to produce a list containing just the web sites' titles in a file called mywebsites.txt.
  2. Copy all the files from /etc into your home directory etc/. Display the contents of all the *.conf files by replacing the word 'host' with 'machine'.
  3. Display the contents of all the *.conf files that don't contain the word 'root'. What is the command using grep and sed?
  4. Print out all the group names that root belongs to.
  5. List all the group names that are 4 or 5 characters long.
  6. List all the files that contain character lines without spaces (blank lines).
  7. List in the etc/ directory all the files that contain numerical characters.
  8. Print with ls only the directory names in /.
  9. Do “ps -aux” and replace user r_polto with root and print it to a file called new_process.txt
  10. List all processes called 'apache' that are owned by usernames starting with “p” or “P”.