PHP Programming/Regular expressions

Syntax

edit
Usual regular expressions
Character Type Explanation
. Dot any character
[...] Brackets character class: all the enumerated characters in the class
[^...] Brackets and circumflex complemented class: all the characters except for the enumerated ones
^ Circumflex string or line start
$ Dollar string or line end
| Pipe alternative
(...) Parenthesis capture group: also used to limit the range of an alternative
* Asterisk 0, 1 or several occurrences
+ Plus 1 or several occurrences
? Interrogation 0 or 1 occurrence
POSIX characters classes[1]
Classe Signification
[[:alpha:]] any letter
[[:digit:]] any digit
[[:xdigit:]] hexadecimal characters
[[:alnum:]] any letter or digit
[[:space:]] any white space
[[:punct:]] any punctuation letter
[[:lower:]] any small cap letter
[[:upper:]] any capital letter
[[:blank:]] space or tabulation
[[:graph:]] displayable et printable characters
[[:cntrl:]] escaping characters
[[:print:]] printable characters, except for the control ones
Unicode regex[2]
Expression Signification
\A String start
\b Start or end of word character
\d Digit
\D Non digit
\s Space characters
\S Non space characters
\w Letter, digit or underscore
\W Non letter, digit or underscore character
\X Unicode character
\z String end

Debugger: https://regex101.com/

  • ?:: ignore the capture group when numeration. Ex: ((?:ignored_substring|other).)
  • ?!: negation. Ex: ((?!excluded_substring).)
  • $1: first capture group result.

Attention: to search for a dollar, "\$" doesn't work because it's the variables format, so the simple quotes must be used instead of the double quotes: '\$'.

in PHP, the regex patterns must always be surrounded by a delimiter symbol. We generally use the grave accent (`), but we also find / and #.

In addition, we can add some options after these delimiters:

i case insensibility
m the "." include carriage returns
x ignore spaces
o only treat the first match
u count the Unicode characters (in multi-byte)

Research

edit

The function ereg(), which allowed to research in regex, has been replaced by preg_match() since PHP 5.3.

preg_match()

edit

The function preg_match[3] is the main regex search function[4]. It returns a Boolean and asks the two mandatory parameters: the regex pattern and the string to scan.

The third parameter represents the variable which stores the results array.

Finally, the fourth accepts an PHP flag allowing to modify the function base behavior.

  • Minimal example:
<?php
$string = 'PHP regex test for the English Wikibooks.';

if (preg_match('`.*Wikibooks.*`', $string)) {
    print('This texts talks about Wikibooks');
} else {
    print('This texts doesn\'t talk about Wikibooks');
}
?>
  • Advanced example:
<?php
$string = 'PHP regex test for the English Wikibooks.';

if (preg_match('`.*Wikibooks.*`', $string), results, $flag) {
    var_dump(results);
} else {
    print('This texts doesn\'t talk about Wikibooks');
}
?>

Flag examples:[5]

  • PREG_OFFSET_CAPTURE: displays the searched substring position in the string.
  • PREG_GREP_INVERT: displays the inverse in preg_grep().

preg_grep()

edit

This function searches into arrays[6].

preg_match_all()

edit

To get all true results in one array, replace preg_match by preg_match_all[7], and print by print_r.

Example to filter a file content:

$regex = "/\(([^)]*)\)/";
preg_match_all($regex, file_get_contents($filename), $matches);
print_r($matches);

Replacement

edit

preg_replace()

edit

The function preg_replace accepts three parameters: the replaced and replacing string to treat.

<?php
// Replace spaces by underscores
$string = "PHP regex test for the English Wikibooks.";
$sortedString = preg_replace('`( )`', '_', $string);
echo $sortedString;
?>

preg_filter()

edit

Same as preg_replace() but its result only include the replacements.

preg_split()

edit

Decomposes a string.

References

edit