Regular Expressions/posix modern regular expression

POSIX modern (extended) regular expressions

edit

The more modern "extended" regular expressions can often be used with modern Unix utilities by including the command line flag "-E".

POSIX extended regular expressions are similar in syntax to the traditional Unix regular expressions, with some exceptions. The following metacharacters are added:

  • + — Match the last "block" one or more times - "ba+" matches "ba", "baa", "baaa" and so on
  • ? — Match the last "block" zero or one times - "ba?" matches "b" or "ba"
  • | — The choice (or set union) operator: match either the expression before or the expression after the operator - "abc|def" matches "abc" or "def".

Also, backslashes are removed: \{...\} becomes {...} and \(...\) becomes (...). Examples:

  • "[hc]+at" matches with "hat", "cat", "hhat", "chat", "hcat", "ccchat" etc.
  • "[hc]?at" matches "hat", "cat" and "at"
  • "([cC]at)|([dD]og)" matches "cat", "Cat", "dog" and "Dog"

Since the characters '(', ')', '[', ']', '.', '*', '?', '+', '^' and '$' are used as special symbols they have to be escaped if they are meant literally. This is done by preceding them with '\' which therefore also has to be escaped this way if meant literally. Examples:

"a\.(\(|\))" matches with the string "a.)" or "a.("