Pattern Syntax

Describes PCRE regex syntax ()

Code Examples / Notes » reference.pcre.pattern.syntax

chris

When enclosing your regular expression in double quotes, back references require two backslashes. For example, \1 is the ascii character \1. You need to provide \\1 to get the back reference.

brett_hegr

This is a gem of a website. It has a big library of useful expressions, a search engine, and a tool for testing your regexp. http://www.regexlib.com/

napalm

Pay attention that some pcre features such as once-only or recursive patterns are not implemented in php versions prior to 5.00 Napalm

info

ive never used regex expressions till now and had loads of difficulty trying to convert a [url]link here[/url] into an href for use with posting messages on a forum, heres what i manage to come up with: $patterns = array( "/\[link\](.*?)\[\/link\]/", "/\[url\](.*?)\[\/url\]/", "/\[img\](.*?)\[\/img\]/", "/\[b\](.*?)\[\/b\]/", "/\[u\](.*?)\[\/u\]/", "/\[i\](.*?)\[\/i\]/" ); $replacements = array( "<a href=\"\\1\">\\1</a>", "<a href=\"\\1\">\\1</a>", "<img src=\"\\1\">", "<b>\\1</b>", "<u>\\1</u>", "<i>\\1</i>" ); $newText = preg_replace($patterns,$replacements, $text); at first it would collect ALL the tags into one link/bold/whatever, until i added the "?" i still dont fully understand it... but it works :)

j daugherty

In the character class meta-character documentation above, the circumflex (^) is described: "^ negate the class, but only if the first character" It should be a little more verbose to fully express the meaning of ^: ^ Negate the character class. If used, this must be the first character of the class (e.g. "[^012]").

29-may-2004 10:15

In addition to the meta-characters mentioned above, there can be another special character in a regular expression: the delimiter you use to start and end your expression. Often people use the / character for this. For example, if you wanted to search for text surrounded by opening and closing tags like'<TD>SELL</TD>' and replace it with nothing (erase it), you might be tempted to use a regex like this: <?php $myNewText = preg_replace('/<TD>SELL</TD>/', "", $myText); ?> This does not work properly. As mentioned in the Introduction at the top of http://www.php.net/manual/en/ref.pcre.php, if the delimiter appears in the middle of your regular expression, then you must put a \ character before it. So this DOES work: <?php $myNewText = preg_replace('/<TD>SELL<\/TD>/', "", $myText); ?> That same Introduction also mentions that you can start and end your expression with characters other than the usual /. Because there are no % characters in the middle of my expression above, I might prefer to use the following: <?php $myNewText = preg_replace('%<TD>SELL</TD>%', "", $myText); ?> That also works correctly, and I did not need a \ before the /.

onerob

If, like me, you tend to use the /U pattern modifier, then you will need to remember that using ? or * to to test for optional characters will match zero characters if it means that the rest of the pattern can continue matching, even if the optional characters exist. For instance, if we have this string: a___bcde and apply this pattern: '/a(_*).*e/U' The whole pattern is matched but none of the _ characters are placed in the sub-pattern. The way around this (if you still wish to use /U) is to use the ? greediness inverter. eg, '/a(_*?).*e/U'

theppg_001

Hi there This was originally made by someone eles but it didn't work correctly and so I remade it and as far as I know it works right. <?php /** * strip_selected_tags ( string str [, string strip_tags[, strip_content flag]] ) * --------------------------------------------------------------------- * Like strip_tags() but inverse; the strip_tags tags will be stripped, not kept. * strip_tags: string with tags to strip, ex: "<a> <quote>" etc. * strip_content flag: TRUE will also strip everything between open and closed tag */ function strip_selected_tags($str, $tags = "", $stripContent = false) { preg_match_all("/<([^>]+)>/i", $tags, $allTags, PREG_PATTERN_ORDER); $replace = "%(<$tag.*?>)(.*?)(<\/$tag.*?>)%is"; foreach ($allTags[1] as $tag) { if ($stripContent) { $str = preg_replace($replace,'',$str); } $str = preg_replace($replace,'${2}',$str); } return $str; } ?> Before I 'fixed' it, when running strip_selected_tags("this is <p align=\"center\">a test and <b>this is bold</b>"," <b>") You would get back "this is <p align=\"center\">a test and this is bold" Why? Because it did not take into account that there could be options etc in the HTML Tag. My one works perfectly when stripping just the tags or the tag and its contents too! So now when you run strip_selected_tags("this is <p align=\"center\">a test and <b>this is bold</b>"," <b>") You get back "this is a test and this is bold" Or when running strip_selected_tags("this is <p align=\"center\">a test and <b>this is bold</b>"," <b>",true) You get back "this is and " Hope it helps someone :)

datacompboy

For example, you want to cut an some <div> element. Accurate, from <div> to correspond </div> element. Here is proof-of-concept code to do this: <? $str = "<dqiv1>1+<div2>2+<div3><b><c>3</c></b></div3>2-</div2>1-</div1>"; preg_match("#<div.> ( ". " ( (?>[^<]*) ( < ( ([^/d]|d([^i]|i[^v])) | /([^d]|d([^i]|i[^v])) ) )? )* ". " | (?R) )* </div.>#xi", $str, $m); var_dump($m[0]); ?> it match accurate from <div2> to </div2>. And, if you change <dqiv1> to <div1>, it will match from <div1> to </div1>

sam marshall

For anyone who sees this error: Warning: preg_match() [function.preg-match]: Compilation failed: PCRE does not support \L, \l, \N, \P, \p, \U, \u, or \X at ... As this manual page says, you need PHP 5.1.0 and the /u modifier in order to enable these features, but that isn't the only requirement! It is possible to install later versions of PHP (we have 5.1.4) while linking to an older PCRE install. A quick look at the PCRE changelog suggests that you probably need at least PCRE 5; we're running 4.5, while the latest is 7.1. You can find out your PCRE version by checking phpinfo(). I suspect this ancient PCRE version is included in some officially-supported Red Hat Enterprise package which is probably why we are running it so might also affect other people.

daniel vandersluis

Concerning note #6 in "Differences From Perl", the \G token *is* supported as the last match position anchor. This has been confirmed to work at least in preg_replace(), though I'd assume it'd work in preg_match_all(), and other functions that can make more than one match, as well.

w w w

Back references are a great way to achieve exact matching when it would have been impossible any other way. Take these three strings. 1) "www.www.com" 2) 'www.www.com' 3) "www.www.com' The regex /^("|').+?("|')$/ would match all three strings but what if you needed the 3rd string above to be illegal because the quotes are not the same? You could write four different regexes to check for every possible case OR you could use back references. /^("|').+?\1$/ will match strings 1 and 2 but not string 3. Try this code for further proof: $str_test="'www.www.com\""; $int_count=preg_match("/^(\"|').+?\\1$/", $str_test, $matches, PREG_OFFSET_CAPTURE); The preg_match function will not match against $str_test because the quotes are mismatched. If you change $str_test to $str_test = "'www.www.com'"; the preg_match will work.

gphemsley

Another good website for Regular Expression reference materials is: http://www.regularexpressions.info/

ned baldessin

Although \w and \W do include as "word characters" locale-specific characters (like "é" if you are using the "fr" locale), \b and \B do not work the same way. For example : "foo était bar" => /\W(était)\W/ => This captures correctly "était". "foo était bar" => /\b(était)\b/ => This fails to capture it. This is confusing, because the manual talks in both cases about "word characters", but fails to mention the difference in behaviour.

pstradomski

About strip_selected_tags function from two posts below: it does not work if somebody uses tags without ending ">" character, like this: <p <b> bold text </b</p This is even valid HTML (but not valid XHTML)

spook

A useful note for beginners: note the difference between mathematical and PHP regular expressions. The _mathematical_ regex: (a+b+c)* which written in PHP syntax will look like: [abc]* will match any string built of a, b or c letters, but will not match string, for example: abcd However, the _PHP_ regular expression will match above string, because the regex means "accept all strings, which contain 0 or more occurences of letters: a, b or c". To convert the regexp from the mathematical to PHP convention, use the ^ and $ characters, which indicate start and end of tested string. So the regexp: ^[abc]*$ means "match all strings, which, between its beginning and end, have 0 or more occurences of letters a, b or c" - which is, what we searched for. Nasty habit, especially after two tests on "theoretical basics of computer science" :)

brewthatistrue

http://zvon.org/other/PerlTutorial/Output/index.html

roland dot illig

<quote> 9. Another as yet unresolved discrepancy is that in Perl 5.005_02 the pattern /^(a)?(?(1)a|b)+$/ matches the string "a", whereas in PCRE it does not. However, in both Perl and PCRE /^(a)?a/ matched against "a" leaves $1 unset. </quote> The last sentence does not indicate a bug. If the string "a" should match against the regular expression /^(a)?a/, the last "a" in the regex must be matched by any literal "a" in the string. The rest of the string is "", which obviously does not match the first /^(a)/.

Change Language

Pattern Modifiers
Pattern Syntax
preg_grep
preg_last_error
preg_match_all
preg_match
preg_quote
preg_replace_callback
preg_replace
preg_split