Grep

Grep is a Unix utility that searches through either information piped to it or files in the current directory. An example should help clarify things.

Let's say that we wanted to search through a directory, and wanted to find all the files that had the string "hello" in their name. You might issue the 'ls' command in a shell to list the directory's content and:

$ ls
DumpSite.sh  crontab.txt  nagios-3.0.6  xmpppy  xymon-4.3.0-beta2

and look through everything manually, or you could use the 'ls' command and pipe the output of ls to grep:

$ ls |grep crontab
crontab.txt

On the contrary, if you want to filter a list unless some entries, put it in the parameter -v:

$ ls |grep -v crontab
DumpSite.sh
nagios-3.0.6
xmpppy
xymon-4.3.0-beta2

the '|' character is the representation of the pipe basically directs the output of the 'ls' command as input for grep. You should get a nice (perhaps empty) list with all the files that have "hello" in their names.

For search term, grep can take regular expressions rather than plain strings. A simple example for that might be looking for all .txt OR .jpg files in a directory:

$ ls | grep '.*\(txt\|jpg\)'

The regex here is made up from .* which can stand for anything in a file's name and $txt\|jpg$ which yields either txt or jpg as file endings.

Options edit

Command-line options aka switches of grep:

-e pattern
-i: Ignore uppercase vs. lowercase.
-v: Invert match.
-c: Output count of matching lines only.
-l: Output matching files only.
-n: Precede each matching line with a line number.
-b: A historical curiosity: precede each matching line with a block number.
-h: Output matching lines without preceding them by file names.
-s: Suppress error messages about nonexistent or unreadable files.
-x
-f file: Take regexes from a file.
-o: Output the matched parts of a matching line.

Command-line options aka switches of GNU grep, beyond the bare-bones grep:

--help
-V, --version
--regexp=pattern, in addition to -e pattern
--invert-match, in addition to -v
--word-regexp, in addition to -w
--line-regexp, in addition to -x
-A num, --after-context=num
-B num, --before-context=num
-C num, -num, --context=num
and more ...

Links:

2.1 Command-line Options at grep manual, gnu.org
Unix grep(1) manual page at man.cat-v.org, DESCRIPTION section

Regular expressions edit

Grep uses a particular version of regular expressions different from sed and Perl. Grep covers POSIX basic regular expressions (see also Regular Expressions/Posix Basic Regular Expressions).

Regular expression features available in grep include *, ., ^, $, [ ], [^ ], , \n, \{i\}, \{i,j\}, \{i,\}.

Regular expression features available in GNU grep as a GNU extension include \?, \+, \b, \B, \<, \>, \w, \W, \s, \S.

Regular expression features available in grep with -E switch include ?, +, |, ( ), {i}, {i,j}, {i,}.

Predefined character classes supported by grep include [:alpha:], [:blank:], [:cntrl:], [:digit:], [:graph:], [:lower:], [:print:], [:punct:], [:space:], [:upper:], and [:xdigit:].

Regular expression features unavailable in grep include Perl's \d, \D, \A and \Z.

Links:

Regular Expressions in GNU grep manual, gnu.org

Tricks edit

When using grep invoked from find it can be very useful to have grep output first the files full path and then the matched content. To be able to do this trick grep into thinking it is invoked with multiple files by adding /dev/null after {} like this:

 find /var/www -exec grep php {} /dev/null \;

Examples edit

Examples of grep use:

echo file.txt | grep ".*$txt\|doc$"
- Matches. "$" and "$" create a group, while "\|" separates items in the group. The group matches if at least one of its items matches.
echo a456 | grep "[a-zA-Z][0-9][0-9]*"
- Matches. "[" and "]" are delimiters for character groups. "*" stands for zero, one, or any other number of the previous.
echo a456 | grep -i "[A-Z][0-9]\+"
- Matches. "\+" stands for one or more occurrences of the previous. Unlike "*", "+" has to be preceded by "\". "-i" makes the search case-insensitive.
echo file.txt | grep -E ".*(txt|doc)"
- Matches. "-E" stands for extended regular expressions. In extended regex, "(" and "|" do not need "\" to act as special characters; they need "\" to act as literals, that is, stand for themselves.
echo abbc | grep -E "abb?c"
- In extended regular expressions enabled by -E switch, the question mark matches zero or one occurrences of the previous.
echo abbc | grep "abb\?c"
- In GNU Grep, \? (question mark preceded by a backslash) matches zero or one occurrences of the previous.
echo a4c | grep -P "a\dc"
- In GNU Grep of some versions, matches. "-P" stands for Perl regular expresions; "\d" in the regex stands for a digit.
grep -P "\x22hello\x22" file.txt
- In GNU Grep of some versions, searches for the string starting with a quotation mark, followed with "hello", followed with another quotation mark. Makes use of "-P", which turns on Perl regex. In Perl regex, "\x22" stands for a quotation mark, via standing for the character with the hexadecimal ASCII value of 22.
grep -P "a\t+b" file.txt
- In GNU Grep of some versions, refers to the tab character (tabulator) by "\t". Enabled by -P.
grep -r "soughtPattern" . --include=*.java
- In GNU Grep of some versions, searches files recursively. Notice the period standing for the current directory.
grep -Fxv -f file2.txt file1.txt
- Outputs set difference: file1.txt - file2.txt. Uses -F to interpret search term literally aka non-regex, -x to match whole lines only, -v to invert match, and -f to take the search terms from a file.
grep -Fx -f file1.txt file2.txt
- Outputs set intersection: those lines of file1.txt that are also in file2.txt.
grep -P "Sch\xc3\xb6nheit" *
- Search in unicode UTF-8 encoded files for the German word "Schönheit". Takes advantage of Perl regex via -P; uses \x followed by hexadecimals to search for the UTF-8 encoding of ö, which is C3B6. To find out the hexadecimal code UTF-8 text, use a UTF-8 enabled plain text editor to create a file containing the text, and then use hex showing program (hexdump on multiple operating systems) to find the hex code of the text. UTF-8 encoding is not be confused with the code point; the code point of ö is F6, while the UTF-8 encoding of it is C3B6.
grep -a -i -o "[-_a-z0-9 ]\{4,\}" mybinary.o
- Emulates the strings command to an extent, outputting sequences of strings of length at least 4 for a certain criterion for allowable string character. Uses -a to treat binary files as text files and -o to output only the found sequences matching the pattern rather than the lines containg the matches.
nice -19 find /etc /var/www -type f -name "*.php" -exec grep -e foreach -e str_replace -e return.*base64_decode {} /dev/null \; >php-possible-malware.txt
- then analyze with this which shows the longest lines first: cat php-possible-malware.txt | awk '{ print length, $0 }' |sort -n -s -r |head -50 |less
perl -ne "print if /\x22hello\x22/" file.txt
- Not really a grep example but a Perl oneliner that you can use if Perl is available and grep is not.

Versions edit

An example of GNU Grep in operation.

Old versions of GNU grep can be obtained from GNU ftp server.

Release announcements of GNU grep are at a savannah group.

A changelog of GNU grep is available from git.savannah.gnu.org.

A version of GNU grep for MS Windows is available from GnuWin32 project, as well as from Cygwin.

External links edit

GNU grep user's manual as one page at gnu.org
grep(1) OS X Manual Page at developer.apple.com
Unix grep(1) manual page at man.cat-v.org
Wikipedia article on grep