sed ("stream editor") is Unix utility for parsing and transforming text files, with ports available on a variety of operating systems. For many purposes, it has been superseded by perl (or the earlier AWK), but for simple transforms in shell scripts, sed retains some use.
Sed is line-oriented – it operates one line at a time – and allows regular expression matching and substitution.
The most commonly used feature of sed is the s command (“substitution”, or “the s/// construction”), which replaces one pattern with another; this originates in the earlier ed, and retains use in perl.
sed s/cat/dog/g in > out
will replace “cat” by “dog” in file in and output it to file out; the “g” means “replace all matches, not just the first on a given line.
One will often wish to use single quotes (' ') to surround the pattern to avoid the shell misinterpreting it:
sed 's/cat/dog/g' in > out
Some implementations require the expression to be preceded by -e and one will wish to use this regardless if there are several patterns:
sed -e 's/cat/dog/g' in > out
Sed can also operate as a pipe, taking in standard input and sending to standard output.
For complex patterns, one will likely wish to use the -r switch to enable “extended regular expressions”, as sed’s default escaping and regular expressions can be awkward to use, particularly in escaping of “(”.
Especially useful is grouping, using (…) to indicate a group in the pattern to match, and using \1, \2, …, \9 to refer to that numbered group in the substitution pattern. For example,
sed -r 's/<(.*)>/<\1><\/\1>/g'
replaces “<a>” with “<a></a>”. This allows simple field parsing and processing.
Beyond use of the s command, one can develop complex programs in sed.
Sed is line-oriented – it operates one line at a time, stripping the trailing newline. To operating on multiple lines, one must use more complicated constructions, namely the N command (add next line to buffer), or H followed by g. See Sed FAQ, Section 5.10
For the simple task of concatenating all lines in a file, easiest is to use the tr utility:
tr '\n' ' '
meaning “replace newlines by spaces”. Note that sed (other than GNU sed) has space limits, so any method to concatenate an entire file into one line in sed yields the entire file in memory; tr instead just processes the input from start to finish, and hence has no such memory problems.
In an expression, this can be written as:
tr '\n' ' ' < in > out
grep and tr are useful complements – the first selects lines, the second applies single-character translation. For instance, you might use grep to select certain lines, then pipe through sed to parse said lines.
Oneliner examples of substitution:
- sed "s/concieve/conceive/g" myfile.txt
- echo "abccbd" | sed "s/a\([bc]*\)d/\1/g"
- Outputs "bccb". Uses \( and \) to mark a group and \1 to refer to the group in the replacement part.
- Possibly works only with GNU sed; to be verified.
- echo "abccbd" | sed -r "s/a([bc]*)d/\1/g"
- In GNU sed, it does the same thing as the previous example, just that the use of -r to switch on extended regular expressions has obviated the need to place backslash before "(" to indicate grouping.
- The -r switch is available in GNU sed, and unavailable in the original Unix sed.
- echo "a b" | sed -r "s/a\s*b/ab/g"
- In GNU sed, Outputs "ab". Uses "\s" to denote whitespace, and "*" to let the previous character group be iterated any number of times. Needs -r to enable extended regex in GNU sed.
grep does not support non-greedy matches as seen in ".*?" expression. In Unix shell scripting, you can use a perl oneliner to emulate sed with non-greedy matches:
- echo "abcbbc" | perl -pe "s/a.*?c/ac/"
- Outputs "acbbc". With 'perl -pe "s/a.*c/ac/"', without the non-greedy "?", outputs "ac".