Delicious Bookmark this on Delicious Share on Facebook SlashdotSlashdot It! Digg! Digg



PHP : Function Reference : Regular Expression Functions (Perl-Compatible) : preg_match_all

preg_match_all

Perform a global regular expression match (PHP 4, PHP 5)
int preg_match_all ( string pattern, string subject, array &matches [, int flags [, int offset]] )

Example 1716. Getting all phone numbers out of some text.

<?php
preg_match_all
("/\(?  (\d{3})?  \)?  (?(1)  [\-\s] ) \d{3}-\d{4}/x",
               
"Call 555-1212 or 1-800-555-1212", $phones);
?>

Example 1717. Find matching HTML tags (greedy)

<?php
// The \\2 is an example of backreferencing. This tells pcre that
// it must match the second set of parentheses in the regular expression
// itself, which would be the ([\w]+) in this case. The extra backslash is
// required because the string is in double quotes.
$html = "<b>bold text</b><a href=howdy.php>click me</a>";

preg_match_all("/(<([\w]+)[^>]*>)(.*)(<\/\\2>)/", $html, $matches, PREG_SET_ORDER);

foreach (
$matches as $val) {
   echo
"matched: " . $val[0] . "\n";
   echo
"part 1: " . $val[1] . "\n";
   echo
"part 2: " . $val[3] . "\n";
   echo
"part 3: " . $val[4] . "\n\n";
}
?>

The above example will output:

matched: <b>bold text</b>
part 1: <b>
part 2: bold text
part 3: </b>

matched: <a href=howdy.php>click me</a>
part 1: <a href=howdy.php>
part 2: click me
part 3: </a>

Related Examples ( Source code ) » preg_match_all



Code Examples / Notes » preg_match_all

egingell

Try this for preg_match_all that takes an array of reg expers.
<?
// Emulates preg_match_all() but takes an array instead of a string.
// Returns an array containing all of the matches.
// The return array is an array containing the arrays normally returned by
// preg_match_all() with the optional third parameter supplied.
function preg_search($ary, $subj) {
$matched = array();
if (is_array($ary)) {
foreach ($ary as $v) {
preg_match_all($v, $subj, $matched[]);
}
} else {
preg_match_all($ary, $subj, $matched[]);
}
return $matched;
}
?>


chuckie

This is a function to convert byte offsets into (UTF-8) character offsets (this is reagardless of whether you use /u modifier:
<?php
function mb_preg_match_all($ps_pattern, $ps_subject, &$pa_matches, $pn_flags = PREG_PATTERN_ORDER, $pn_offset = 0, $ps_encoding = NULL) {
 // WARNING! - All this function does is to correct offsets, nothing else:
 //
 if (is_null($ps_encoding))
   $ps_encoding = mb_internal_encoding();
 $pn_offset = strlen(mb_substr($ps_subject, 0, $pn_offset, $ps_encoding));
 $ret = preg_match_all($ps_pattern, $ps_subject, $pa_matches, $pn_flags, $pn_offset);
 if ($ret && ($pn_flags & PREG_OFFSET_CAPTURE))
   foreach($pa_matches as &$ha_match)
     foreach($ha_match as &$ha_match)
       $ha_match[1] = mb_strlen(substr($ps_subject, 0, $ha_match[1]), $ps_encoding);
   //
   // (code is independent of PREG_PATTER_ORDER / PREG_SET_ORDER)
 return $ret;
 }
?>


b2sing4u

This function converts all HTML style decimal character code to hexadecimal code.
ex) Hi &#959; &#9674; Dec  ->  Hi &#x03BF; &#x25CA; Dec
function d2h($word) {
 $n = preg_match_all("/&#(\d+?);/", $word, $match, PREG_PATTERN_ORDER);
 for ($j = 0; $j < $n; $j++) {
   $word = str_replace($match[0][$j], sprintf("&#x%04X;", $match[1][$j]), $word);
 }
 return($word);
}
& This function converts all HTML style hexadecimal character code to decimal code.
ex) Hello &#x03BF; &#x25CA; Hex  ->  Hello &#959; &#9674; Hex
function h2d($word) {
 $n = preg_match_all("/&#x([0-9a-fA-F]+?);/", $word, $match, PREG_PATTERN_ORDER);
 for ($j = 0; $j < $n; $j++) {
   $word = str_replace($match[0][$j], sprintf("&#%u;", hexdec($match[1][$j])), $word);
 }
 return($word);
}


mnc

PREG_OFFSET_CAPTURE always seems to provide byte offsets, rather than character position offsets, even when you are using the unicode /u modifier.

phektus

If you'd like to include DOUBLE QUOTES on a regular expression for use with preg_match_all, try ESCAPING THRICE, as in: \\\"
For example, the pattern:
'/<table>[\s\w\/<>=\\\"]*<\/table>/'
Should be able to match:
<table>
<row>
<col align="left" valign="top">a</col>
<col align="right" valign="bottom">b</col>
</row>
</table>
.. with all there is under those table tags.
I'm not really sure why this is so, but I tried just the double quote and one or even two escape characters and it won't work. In my frustration I added another one and then it's cool.


php

If you want to get a string with all the 'normal' characters, this may be better:
$clean = preg_replace('/\W+/', '', $dirty);
\W is the opposite of \w and will match any character that is not a letter or digit or the underscore character, plus it respects the current locale. Use [^0-9a-zA-Z_]+ instead of \W if you need ASCII-only.


fabriceb

If you just want to find out how many times a string contains another simple string, don't use preg_match_all like I did before I fould the substr_count function.
Use
<?php
$nrMatches = substr_count ('foobarbar', 'bar');
?>
instead. Hope this helps some other people like me who like to think too complicated :-)


master

I use the following to clean unwanted characters and to have only allowed ones in the string:
<?
preg_match_all ("/[a-zA-Z0-9]*/", $string_in, $string_out_array);
$string_out = "";
for ($i=0; $i < sizeof ($string_out_array[0]); $i++) {
       $string_out .= $string_out_array[0][$i];
}
?>


mail

I refurnished connum at DONOTSPAMME dot googlemail dot com autoCloseTags function:
<?php
/**
* close all open xhtml tags at the end of the string
*
* @author Milian Wolff <http://milianw.de>
* @param string $html
* @return string
*/
function closetags($html){
 #put all opened tags into an array
 preg_match_all("#<([a-z]+)( .*)?(?!/)>#iU",$html,$result);
 $openedtags=$result[1];
 #put all closed tags into an array
 preg_match_all("#</([a-z]+)>#iU",$html,$result);
 $closedtags=$result[1];
 $len_opened = count($openedtags);
 # all tags are closed
 if(count($closedtags) == $len_opened){
   return $html;
 }
 $openedtags = array_reverse($openedtags);
 # close tags
 for($i=0;$i<$len_opened;$i++) {
   if (!in_array($openedtags[$i],$closedtags)){
     $html .= '</'.$openedtags[$i].'>';
   } else {
     unset($closedtags[array_search($openedtags[$i],$closedtags)]);
   }
 }
 return $html;
}
?>


sam

Here's something I made awhile ago to colorize long regular expressions. I can't guarantee it'll work for everything/everyone, but it helps me a lot and might help someone else.
Usage:
<?php echo highlight_regexp("/^[0-9]{2}:[0-9]{2}[apAP]$/"); ?>
<?php
function highlight_regexp($pattern) {
   $colors = array(
       "/" => "red",
       "(" => "green",
       ")" => "green",
       "[" => "blue",
       "]" => "blue",
       "{" => "orange",
       "}" => "orange"
   );
   $specialchars = array("?","+","*",".","|");
   $space = "&nbsp; &nbsp; ";
   for ($i = 0; $i < strlen($pattern); $i++) {
       unset($spacing);
       if ($skip) {
           $show = 1;
           $skip = 0;
       } else
           switch ($pattern{$i}) {
               case "/":
               case "(":
               case "[":
               case "{":
                   if ($skip) {
                       $show = 1;
                       $skip = 0;
                   } else {
                       $tier++;
                       if ($pattern{$i} == "/")
                           $tier = 0;
                       for ($j = 0; $j < $tier; $j++)
                           $spacing .= $space;
                       $pattern{$i} == "{" or $return .= "
$spacing";
                       $return .= "<font color=".$colors[$pattern{$i}]."><b>".$pattern{$i}."</b></font>";
                       if ($pattern{$i} == "(")
                           $spaceover = "
$spacing$space";
                       else {
                           if ($pattern{$i} == "[")
                               $inbrackets = 1;
                           unset($spaceover);
                       }
                   }
                   $show = 0;
                   break;
               case ")":
               case "]":
               case "}":
                   if ($skip) {
                       $show = 1;
                       $skip = 0;
                   } else {
                       for ($j = 0; $j < $tier; $j++)
                           $spacing .= $space;
                       if ($pattern{$i} == ")")
                           $return .= "
$spacing";
                       elseif ($pattern{$i} == "]")
                           $inbrackets = 0;
                       $return .= "<font color=".$colors[$pattern{$i}]."><b>".$pattern{$i}."</b></font>\n";
                       $spaceover = "
$spacing";
                       $tier--;
                   }
                   $show = 0;
                   break;
               default:
                   $show = 1;
                   break;
           }
           if ($show) {
               if (!$inbrackets && in_array($pattern{$i},$specialchars)) {
                   $skipspaceover = 1 ;
                   $preextra = "<font style='font-weight:bold;color:red'>";
                   $postextra = "</font>";
                   $replace = "";
               } elseif ($pattern{$i} == " ") {
                   $preextra = "<i style='font-size:10px'>";
                   $replace = "(space)";
                   $postextra = "</i>";
               } else
                   $preextra = $postextra = $replace = $skipspaceover = "";
               if ($spaceover && !$skipspaceover) {
                   $return .= $spaceover;
                   unset($spaceover);
               }
               $return .= $preextra.($replace ? $replace : $pattern{$i}).$postextra;
           }
   }
   return $return;
}
?>


phpnet

Here's some fleecy code to 1. validate RCF2822 conformity of address lists and 2. to extract the address specification (the part commonly known as 'email'). I wouldn't suggest using it for input form email checking, but it might be just what you want for other email applications. I know it can be optimized further, but that part I'll leave up to you nutcrackers. The total length of the resulting Regex is about 30000 bytes. That because it accepts comments. You can remove that by setting $cfws to $fws and it shrinks to about 6000 bytes. Conformity checking is absolutely and strictly referring to RFC2822. Have fun and email me if you have any enhancements!
<?php
function mime_extract_rfc2822_address($string)
{
       //rfc2822 token setup
       $crlf           = "(?:\r\n)";
       $wsp            = "[\t ]";
       $text           = "[\\x01-\\x09\\x0B\\x0C\\x0E-\\x7F]";
       $quoted_pair    = "(?:\\\\$text)";
       $fws            = "(?:(?:$wsp*$crlf)?$wsp+)";
       $ctext          = "[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F" .
                         "!-'*-[\\]-\\x7F]";
       $comment        = "(\\((?:$fws?(?:$ctext|$quoted_pair|(?1)))*" .
                         "$fws?\\))";
       $cfws           = "(?:(?:$fws?$comment)*(?:(?:$fws?$comment)|$fws))";
       //$cfws           = $fws; //an alternative to comments
       $atext          = "[!#-'*+\\-\\/0-9=?A-Z\\^-~]";
       $atom           = "(?:$cfws?$atext+$cfws?)";
       $dot_atom_text  = "(?:$atext+(?:\\.$atext+)*)";
       $dot_atom       = "(?:$cfws?$dot_atom_text$cfws?)";
       $qtext          = "[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F!#-[\\]-\\x7F]";
       $qcontent       = "(?:$qtext|$quoted_pair)";
       $quoted_string  = "(?:$cfws?\"(?:$fws?$qcontent)*$fws?\"$cfws?)";
       $dtext          = "[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F!-Z\\^-\\x7F]";
       $dcontent       = "(?:$dtext|$quoted_pair)";
       $domain_literal = "(?:$cfws?\\[(?:$fws?$dcontent)*$fws?]$cfws?)";
       $domain         = "(?:$dot_atom|$domain_literal)";
       $local_part     = "(?:$dot_atom|$quoted_string)";
       $addr_spec      = "($local_part@$domain)";
       $display_name   = "(?:(?:$atom|$quoted_string)+)";
       $angle_addr     = "(?:$cfws?<$addr_spec>$cfws?)";
       $name_addr      = "(?:$display_name?$angle_addr)";
       $mailbox        = "(?:$name_addr|$addr_spec)";
       $mailbox_list   = "(?:(?:(?:(?<=:)|,)$mailbox)+)";
       $group          = "(?:$display_name:(?:$mailbox_list|$cfws)?;$cfws?)";
       $address        = "(?:$mailbox|$group)";
       $address_list   = "(?:(?:^|,)$address)+";
       //output length of string (just so you see how f**king long it is)
       echo(strlen($address_list) . " ");
       //apply expression
       preg_match_all("/^$address_list$/", $string, $array, PREG_SET_ORDER);
       return $array;
};
?>


han jun kwang me -at- hjk.ikueb.com

Here's a simple function to retrieve attribute values of HTML tags:
<?php
function getAttribs($t, $a, $s) {
preg_match_all("/(<".$t." .*?".$a.".*?=.*?\")(.*?)(\".*?>)/", $s, $m);
return $m[2];
}
?>
Where $t is the tag name (e.g. img), $a is the attribute you are looking for (e.g. src) and $s is the HTML string.


ino

extract all emails from text
preg_match_all("/[-a-z0-9\._]+@[-a-z0-9\._]+\.[a-z]{2,4}/", file_get_contents('1.txt'), $email);
print_r ($email);


mr davin

<?php
// Returns an array of strings where the start and end are found
function findinside($start, $end, $string) {
preg_match_all('/' . preg_quote($start, '/') . '([^\.)]+)'. preg_quote($end, '/').'/i', $string, $m);
return $m[1];
}

$start = "mary has";
$end = "lambs.";
$string = "mary has 6 lambs. phil has 13 lambs. mary stole phil's lambs. now mary has all the lambs.";
$out = findinside($start, $end, $string);
print_r ($out);
/* Results in
(
   [0] =>  6
   [1] =>  all the
)
*/
?>


aaron

<?php
/* Finds email addresses and urls in a body of text, and adds <a> tags
* around them. It will also obfuscate the email address with some
* javascript so that bots don't recognize the email address there.
*
* Include this Javascript code. This tells the browser how to decode
* the email address:
*
  function swapPairs(s){
 var res = "";
 for (var i=0; i<s.length; i++){
var ch = s.charCodeAt(i) ;
res += String.fromCharCode(
  ( ch & 0xF0 ) +
  ((ch & 0x0C)>>2) +
  ((ch & 0x03)<<2)
  );
}
 return res;
}
*
*/
function ActivateLinks($text) {
$matches = array();
// Find all email addresses in the text
// regex based on http://www.regular-expressions.info/email.html
$regex = '/\b([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+[a-zA-Z]{2,4})\b/';
preg_match_all($regex, $text, $matches);
if( count($matches[0]) > 0 ) {
foreach( $matches[0] as $email ) {
$obfuscate = '<a href="mailto:'.$email.'">'.$email.'</a>';
$encrypted = "";
for( $i=0; $i<strlen($obfuscate); $i++ ) {
$ch = ord(substr($obfuscate,$i,1));
$encrypted .= chr(
($ch & 0xF0) +
(($ch & 0x0C) >> 2) +
(($ch & 0x03) << 2)
);
}
$replace = '<script>document.write(swapPairs("'.$encrypted.'"))</script>';
$text = str_replace($email, $replace, $text);
}
}
// Find all http or ftp links in the text
// regex from http://fundisom.com/phparadise/php/string_handling/autolink
$text = preg_replace( '/(http|ftp)+(s)?:(\/\/)((\w|\.)+)(\/)?(\S+)?/i',
'<a href="\0">\4</a>', $text );
return $text;
}
?>


Change Language


Follow Navioo On Twitter
Pattern Modifiers
Pattern Syntax
preg_grep
preg_last_error
preg_match_all
preg_match
preg_quote
preg_replace_callback
preg_replace
preg_split
eXTReMe Tracker