|
preg_match_all
Perform a global regular expression match
(PHP 4, PHP 5)
Example 1716. Getting all phone numbers out of some text.<?php Example 1717. Find matching HTML tags (greedy)<?php The above example will output: matched: <b>bold text</b> Related Examples ( Source code ) » preg_match_all Examples ( Source code ) » the m modifier to change the behavior of $ Examples ( Source code ) » RFC-compliant email address validator Code Examples / Notes » preg_match_allegingell
Try this for preg_match_all that takes an array of reg expers. <? // Emulates preg_match_all() but takes an array instead of a string. // Returns an array containing all of the matches. // The return array is an array containing the arrays normally returned by // preg_match_all() with the optional third parameter supplied. function preg_search($ary, $subj) { $matched = array(); if (is_array($ary)) { foreach ($ary as $v) { preg_match_all($v, $subj, $matched[]); } } else { preg_match_all($ary, $subj, $matched[]); } return $matched; } ?> chuckie
This is a function to convert byte offsets into (UTF-8) character offsets (this is reagardless of whether you use /u modifier: <?php function mb_preg_match_all($ps_pattern, $ps_subject, &$pa_matches, $pn_flags = PREG_PATTERN_ORDER, $pn_offset = 0, $ps_encoding = NULL) { // WARNING! - All this function does is to correct offsets, nothing else: // if (is_null($ps_encoding)) $ps_encoding = mb_internal_encoding(); $pn_offset = strlen(mb_substr($ps_subject, 0, $pn_offset, $ps_encoding)); $ret = preg_match_all($ps_pattern, $ps_subject, $pa_matches, $pn_flags, $pn_offset); if ($ret && ($pn_flags & PREG_OFFSET_CAPTURE)) foreach($pa_matches as &$ha_match) foreach($ha_match as &$ha_match) $ha_match[1] = mb_strlen(substr($ps_subject, 0, $ha_match[1]), $ps_encoding); // // (code is independent of PREG_PATTER_ORDER / PREG_SET_ORDER) return $ret; } ?> b2sing4u
This function converts all HTML style decimal character code to hexadecimal code. ex) Hi ο ◊ Dec -> Hi ο ◊ Dec function d2h($word) { $n = preg_match_all("/&#(\d+?);/", $word, $match, PREG_PATTERN_ORDER); for ($j = 0; $j < $n; $j++) { $word = str_replace($match[0][$j], sprintf("&#x%04X;", $match[1][$j]), $word); } return($word); } & This function converts all HTML style hexadecimal character code to decimal code. ex) Hello ο ◊ Hex -> Hello ο ◊ Hex function h2d($word) { $n = preg_match_all("/&#x([0-9a-fA-F]+?);/", $word, $match, PREG_PATTERN_ORDER); for ($j = 0; $j < $n; $j++) { $word = str_replace($match[0][$j], sprintf("&#%u;", hexdec($match[1][$j])), $word); } return($word); } mnc
PREG_OFFSET_CAPTURE always seems to provide byte offsets, rather than character position offsets, even when you are using the unicode /u modifier.
phektus
If you'd like to include DOUBLE QUOTES on a regular expression for use with preg_match_all, try ESCAPING THRICE, as in: \\\" For example, the pattern: '/<table>[\s\w\/<>=\\\"]*<\/table>/' Should be able to match: <table> <row> <col align="left" valign="top">a</col> <col align="right" valign="bottom">b</col> </row> </table> .. with all there is under those table tags. I'm not really sure why this is so, but I tried just the double quote and one or even two escape characters and it won't work. In my frustration I added another one and then it's cool. php
If you want to get a string with all the 'normal' characters, this may be better: $clean = preg_replace('/\W+/', '', $dirty); \W is the opposite of \w and will match any character that is not a letter or digit or the underscore character, plus it respects the current locale. Use [^0-9a-zA-Z_]+ instead of \W if you need ASCII-only. fabriceb
If you just want to find out how many times a string contains another simple string, don't use preg_match_all like I did before I fould the substr_count function. Use <?php $nrMatches = substr_count ('foobarbar', 'bar'); ?> instead. Hope this helps some other people like me who like to think too complicated :-) master
I use the following to clean unwanted characters and to have only allowed ones in the string: <? preg_match_all ("/[a-zA-Z0-9]*/", $string_in, $string_out_array); $string_out = ""; for ($i=0; $i < sizeof ($string_out_array[0]); $i++) { $string_out .= $string_out_array[0][$i]; } ?> mail
I refurnished connum at DONOTSPAMME dot googlemail dot com autoCloseTags function: <?php /** * close all open xhtml tags at the end of the string * * @author Milian Wolff <http://milianw.de> * @param string $html * @return string */ function closetags($html){ #put all opened tags into an array preg_match_all("#<([a-z]+)( .*)?(?!/)>#iU",$html,$result); $openedtags=$result[1]; #put all closed tags into an array preg_match_all("#</([a-z]+)>#iU",$html,$result); $closedtags=$result[1]; $len_opened = count($openedtags); # all tags are closed if(count($closedtags) == $len_opened){ return $html; } $openedtags = array_reverse($openedtags); # close tags for($i=0;$i<$len_opened;$i++) { if (!in_array($openedtags[$i],$closedtags)){ $html .= '</'.$openedtags[$i].'>'; } else { unset($closedtags[array_search($openedtags[$i],$closedtags)]); } } return $html; } ?> sam
Here's something I made awhile ago to colorize long regular expressions. I can't guarantee it'll work for everything/everyone, but it helps me a lot and might help someone else. Usage: <?php echo highlight_regexp("/^[0-9]{2}:[0-9]{2}[apAP]$/"); ?> <?php function highlight_regexp($pattern) { $colors = array( "/" => "red", "(" => "green", ")" => "green", "[" => "blue", "]" => "blue", "{" => "orange", "}" => "orange" ); $specialchars = array("?","+","*",".","|"); $space = " "; for ($i = 0; $i < strlen($pattern); $i++) { unset($spacing); if ($skip) { $show = 1; $skip = 0; } else switch ($pattern{$i}) { case "/": case "(": case "[": case "{": if ($skip) { $show = 1; $skip = 0; } else { $tier++; if ($pattern{$i} == "/") $tier = 0; for ($j = 0; $j < $tier; $j++) $spacing .= $space; $pattern{$i} == "{" or $return .= " $spacing"; $return .= "<font color=".$colors[$pattern{$i}]."><b>".$pattern{$i}."</b></font>"; if ($pattern{$i} == "(") $spaceover = " $spacing$space"; else { if ($pattern{$i} == "[") $inbrackets = 1; unset($spaceover); } } $show = 0; break; case ")": case "]": case "}": if ($skip) { $show = 1; $skip = 0; } else { for ($j = 0; $j < $tier; $j++) $spacing .= $space; if ($pattern{$i} == ")") $return .= " $spacing"; elseif ($pattern{$i} == "]") $inbrackets = 0; $return .= "<font color=".$colors[$pattern{$i}]."><b>".$pattern{$i}."</b></font>\n"; $spaceover = " $spacing"; $tier--; } $show = 0; break; default: $show = 1; break; } if ($show) { if (!$inbrackets && in_array($pattern{$i},$specialchars)) { $skipspaceover = 1 ; $preextra = "<font style='font-weight:bold;color:red'>"; $postextra = "</font>"; $replace = ""; } elseif ($pattern{$i} == " ") { $preextra = "<i style='font-size:10px'>"; $replace = "(space)"; $postextra = "</i>"; } else $preextra = $postextra = $replace = $skipspaceover = ""; if ($spaceover && !$skipspaceover) { $return .= $spaceover; unset($spaceover); } $return .= $preextra.($replace ? $replace : $pattern{$i}).$postextra; } } return $return; } ?> phpnet
Here's some fleecy code to 1. validate RCF2822 conformity of address lists and 2. to extract the address specification (the part commonly known as 'email'). I wouldn't suggest using it for input form email checking, but it might be just what you want for other email applications. I know it can be optimized further, but that part I'll leave up to you nutcrackers. The total length of the resulting Regex is about 30000 bytes. That because it accepts comments. You can remove that by setting $cfws to $fws and it shrinks to about 6000 bytes. Conformity checking is absolutely and strictly referring to RFC2822. Have fun and email me if you have any enhancements! <?php function mime_extract_rfc2822_address($string) { //rfc2822 token setup $crlf = "(?:\r\n)"; $wsp = "[\t ]"; $text = "[\\x01-\\x09\\x0B\\x0C\\x0E-\\x7F]"; $quoted_pair = "(?:\\\\$text)"; $fws = "(?:(?:$wsp*$crlf)?$wsp+)"; $ctext = "[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F" . "!-'*-[\\]-\\x7F]"; $comment = "(\\((?:$fws?(?:$ctext|$quoted_pair|(?1)))*" . "$fws?\\))"; $cfws = "(?:(?:$fws?$comment)*(?:(?:$fws?$comment)|$fws))"; //$cfws = $fws; //an alternative to comments $atext = "[!#-'*+\\-\\/0-9=?A-Z\\^-~]"; $atom = "(?:$cfws?$atext+$cfws?)"; $dot_atom_text = "(?:$atext+(?:\\.$atext+)*)"; $dot_atom = "(?:$cfws?$dot_atom_text$cfws?)"; $qtext = "[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F!#-[\\]-\\x7F]"; $qcontent = "(?:$qtext|$quoted_pair)"; $quoted_string = "(?:$cfws?\"(?:$fws?$qcontent)*$fws?\"$cfws?)"; $dtext = "[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F!-Z\\^-\\x7F]"; $dcontent = "(?:$dtext|$quoted_pair)"; $domain_literal = "(?:$cfws?\\[(?:$fws?$dcontent)*$fws?]$cfws?)"; $domain = "(?:$dot_atom|$domain_literal)"; $local_part = "(?:$dot_atom|$quoted_string)"; $addr_spec = "($local_part@$domain)"; $display_name = "(?:(?:$atom|$quoted_string)+)"; $angle_addr = "(?:$cfws?<$addr_spec>$cfws?)"; $name_addr = "(?:$display_name?$angle_addr)"; $mailbox = "(?:$name_addr|$addr_spec)"; $mailbox_list = "(?:(?:(?:(?<=:)|,)$mailbox)+)"; $group = "(?:$display_name:(?:$mailbox_list|$cfws)?;$cfws?)"; $address = "(?:$mailbox|$group)"; $address_list = "(?:(?:^|,)$address)+"; //output length of string (just so you see how f**king long it is) echo(strlen($address_list) . " "); //apply expression preg_match_all("/^$address_list$/", $string, $array, PREG_SET_ORDER); return $array; }; ?> han jun kwang me -at- hjk.ikueb.com
Here's a simple function to retrieve attribute values of HTML tags: <?php function getAttribs($t, $a, $s) { preg_match_all("/(<".$t." .*?".$a.".*?=.*?\")(.*?)(\".*?>)/", $s, $m); return $m[2]; } ?> Where $t is the tag name (e.g. img), $a is the attribute you are looking for (e.g. src) and $s is the HTML string. ino
extract all emails from text preg_match_all("/[-a-z0-9\._]+@[-a-z0-9\._]+\.[a-z]{2,4}/", file_get_contents('1.txt'), $email); print_r ($email); mr davin
<?php // Returns an array of strings where the start and end are found function findinside($start, $end, $string) { preg_match_all('/' . preg_quote($start, '/') . '([^\.)]+)'. preg_quote($end, '/').'/i', $string, $m); return $m[1]; } $start = "mary has"; $end = "lambs."; $string = "mary has 6 lambs. phil has 13 lambs. mary stole phil's lambs. now mary has all the lambs."; $out = findinside($start, $end, $string); print_r ($out); /* Results in ( [0] => 6 [1] => all the ) */ ?> aaron
<?php /* Finds email addresses and urls in a body of text, and adds <a> tags * around them. It will also obfuscate the email address with some * javascript so that bots don't recognize the email address there. * * Include this Javascript code. This tells the browser how to decode * the email address: * function swapPairs(s){ var res = ""; for (var i=0; i<s.length; i++){ var ch = s.charCodeAt(i) ; res += String.fromCharCode( ( ch & 0xF0 ) + ((ch & 0x0C)>>2) + ((ch & 0x03)<<2) ); } return res; } * */ function ActivateLinks($text) { $matches = array(); // Find all email addresses in the text // regex based on http://www.regular-expressions.info/email.html $regex = '/\b([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+[a-zA-Z]{2,4})\b/'; preg_match_all($regex, $text, $matches); if( count($matches[0]) > 0 ) { foreach( $matches[0] as $email ) { $obfuscate = '<a href="mailto:'.$email.'">'.$email.'</a>'; $encrypted = ""; for( $i=0; $i<strlen($obfuscate); $i++ ) { $ch = ord(substr($obfuscate,$i,1)); $encrypted .= chr( ($ch & 0xF0) + (($ch & 0x0C) >> 2) + (($ch & 0x03) << 2) ); } $replace = '<script>document.write(swapPairs("'.$encrypted.'"))</script>'; $text = str_replace($email, $replace, $text); } } // Find all http or ftp links in the text // regex from http://fundisom.com/phparadise/php/string_handling/autolink $text = preg_replace( '/(http|ftp)+(s)?:(\/\/)((\w|\.)+)(\/)?(\S+)?/i', '<a href="\0">\4</a>', $text ); return $text; } ?> |