Delicious Bookmark this on Delicious Share on Facebook SlashdotSlashdot It! Digg! Digg



PHP : Function Reference : Regular Expression Functions (POSIX Extended)

Regular Expression Functions (POSIX Extended)

Introduction

Tip:

PHP also supports regular expressions using a Perl-compatible syntax using the PCRE functions. Those functions support non-greedy matching, assertions, conditional subpatterns, and a number of other features not supported by the POSIX-extended regular expression syntax.

Warning:

These regular expression functions are not binary-safe. The PCRE functions are.

Regular expressions are used for complex string manipulation. PHP uses the POSIX extended regular expressions as defined by POSIX 1003.2. For a full description of POSIX regular expressions see the » regex man pages included in the regex directory in the PHP distribution. It's in manpage format, so you'll want to do something along the lines of man /usr/local/src/regex/regex.7 in order to read it.

Requirements

No external libraries are needed to build this extension.

Installation

Warning:

Do not change the TYPE unless you know what you are doing.

To enable regexp support configure PHP --with-regex[=TYPE]. TYPE can be one of system, apache, php. The default is to use php.

The windows version of PHP has built in support for this extension. You do not need to load any additional extension in order to use these functions.

Runtime Configuration

This extension has no configuration directives defined in php.ini.

Resource Types

This extension has no resource types defined.

Predefined Constants

This extension has no constants defined.

Examples

Example 1890. Regular Expression Examples

<?php
// Returns true if "abc" is found anywhere in $string.
ereg("abc", $string);            

// Returns true if "abc" is found at the beginning of $string.
ereg("^abc", $string);

// Returns true if "abc" is found at the end of $string.
ereg("abc$", $string);

// Returns true if client browser is Netscape 2, 3 or MSIE 3.
eregi("(ozilla.[23]|MSIE.3)", $_SERVER["HTTP_USER_AGENT"]);

// Places three space separated words into $regs[1], $regs[2] and $regs[3].
ereg("([[:alnum:]]+) ([[:alnum:]]+) ([[:alnum:]]+)", $string, $regs);

// Put a <br /> tag at the beginning of $string.
$string = ereg_replace("^", "<br />", $string);

// Put a <br /> tag at the end of $string.
$string = ereg_replace("$", "<br />", $string);

// Get rid of any newline characters in $string.
$string = ereg_replace("\n", "", $string);
?>


See Also

For regular expressions in Perl-compatible syntax have a look at the PCRE functions. The simpler shell style wildcard pattern matching is provided by fnmatch().

Table of Contents

ereg_replace — Replace regular expression
ereg — Regular expression match
eregi_replace — Replace regular expression case insensitive
eregi — Case insensitive regular expression match
split — Split string into array by regular expression
spliti — Split string into array by regular expression case insensitive
sql_regcase — Make regular expression for case insensitive match

Code Examples / Notes » ref.regex

17-may-2007 11:55

To exclude a word you can you derive from the following:
This checks that no table tag appears between my open and close td tags.
<td>((?!<table>).{1})*</td>


php

To add to tgt's tip for metacharacters.
To test for a whole word, use [[:<:]]yourword[[:>:]]


tgt

Tip !
Metacharacters in regular expresions are usefull and easy to use.
The following is a set of special values that denote certain common ranges. They have the advantage that also take in account the 'locale' i.e. any variant of the local language/coding system.
[:digit:]      Only the digits 0 to 9
[:alnum:]      Any alphanumeric character 0 to 9 OR A to Z or a to z.
[:alpha:]       Any alpha character A to Z or a to z.
[:blank:]       Space and TAB characters only.
[:xdigit:]     .
[:punct:]       Punctuation symbols . , " ' ? ! ; :
[:print:]      Any printable character.
[:space:]      Any space characters.
[:graph:]       .
[:upper:]       Any alpha character A to Z.
[:lower:]       Any alpha character a to z.
[:cntrl:]        .


edward z. yang

The fact that 'regex' functions are not binary safe have some very important security implications for people who are using ereg to validate their input data.
Suppose I have an expression:
<?php
$pattern = '^[[:alnum:]]*$';
?>
This should match any number of alphanumeric characters, right? Well, if the string you're matching is not binary, sure. However, say we have a null-byte tossed in the string:
<?php
$string = chr(0) . "<script>alert('xss')</script>";
echo ereg($pattern, $string);
?>
Will return true. Note that it is trivially easy to inject null bytes into PHP parameters:
index.php?content=%00ASCII
Scary. So unless you really know what you're doing, just use the PCRE preg_* functions.


spiceee

sorry to be picky here but saying ^ is beginning of a line or $ is end of line is rather misleading, if you're working on a daily basis with regexes.
it might be that it is most of the time correct BUT in some occasions you'd be better off to think of ^ as "start of string" and $ as "end of string".
there are ways to make your regex engine forget about your system's notion of a newline, it's what is commonly refered to as multiline regexes...


bps7j

Something that really got me: I'm used to using Perl's regexps, and so I used \s to check for a whitespace character in a password on a website. My PHP book (Wrox Press, Professional PHP Programming) agreed with me that this is exactly the same as [ \r\n\t\f\v], but it's NOT. In fact, what it did was keep anyone from joining the site if they put an 's' in their password! So beware, check for subtle differences between what you're used to and PHP.
[[:space:]] works fine, by the way.
I'm going to use the pcre functions from now on... I like Perl :o)


david

Sadly, the Posix regexp evaluator (PHP 4.1.2) does not seem to support multi-character coallating sequences, even though such sequences are included in the man-page documentation.
Specifically, the man-page discusses the expression "[[.ch.]]*c" which matches the first five characters of "chchcc".  Running this expression in ereg_replace generates the error "Warning: REG_ECOLLATE".  (Running an equivalent expression with only one character between the periods does work, however.)
Multi-character coallating sequences are not supported!
This is really, really too bad, because it would have provided a simple way to exlude words from the target.
I'm going to go learn PCRE, now.  :-(


luciano_at_braziliantranslation.net

mholdgate wrote a very nice quick reference guide in the next page (http://www.php.net/manual/en/function.ereg.php), but I felt it could be improved a little:
________________
^ Start of line
$ End of line
n? Zero or only one single occurrence of character 'n'
n* Zero or more occurrences of character 'n'
n+ At least one or more occurrences of character 'n'
n{2} Exactly two occurrences of 'n'
n{2,} At least 2 or more occurrences of 'n'
n{2,4} From 2 to 4 occurrences of 'n'
. Any single character
() Parenthesis to group expressions
(.*) Zero or more occurrences of any single character, ie, anything!
(n|a) Either 'n' or 'a'
[1-6] Any single digit in the range between 1 and 6
[c-h] Any single lower case letter in the range between c and h
[D-M] Any single upper case letter in the range between D and M
[^a-z] Any single character EXCEPT any lower case letter between a and z.
Pitfall: the ^ symbol only acts as an EXCEPT rule if it is the
very first character inside a range, and it denies the
entire range including the ^ symbol itself if it appears again
later in the range. Also remember that if it is the first
character in the entire expression, it means "start of line".
In any other place, it is always treated as a regular ^ symbol.
In other words, you cannot deny a word with ^undesired_word
or a group with ^(undesired_phrase).
Read more detailed regex documentation to find out what is
necessary to achieve this.
[_4^a-zA-Z] Any single character which can be the underscore or the
number 4 or the ^ symbol or any letter, lower or upper case
?, +, * and the {} count parameters can be appended not only to a single character, but also to a group() or a range[].
therefore,
^.{2}[a-z]{1,2}_?[0-9]*([1-6]|[a-f])[^1-9]{2}a+$
would mean:
^.{2} = A line beginning with any two characters,
[a-z]{1,2} = followed by either 1 or 2 lower case letters,
_? = followed by an optional underscore,
[0-9]* = followed by zero or more digits,
([1-6]|[a-f]) = followed by either a digit between 1 and 6 OR a
lower case letter between a and f,
[^1-9]{2} = followed by any two characters except digits
between 1 and 9 (0 is possible),
a+$ = followed by at least one or more
occurrences of 'a' at the end of a line.


regex

It's easy to exclude characters but excluding words with a regular expression is a bit more tricky. For parentheses there is no equivalent to the ^ for brackets. The only way I've found to exclude a string is to proceed by inverse logic: accept all the words that do NOT correspond to the string. So if you want to accept all strings except those _begining_ with "abc", you'd have to accept any string that matches one of the following:
 ^(ab[^c])
 ^(a[^b]c)
 ^(a[^b][^c])
 ^([^a]bc)
 ^([^a]b[^c])
 ^([^a][^b]c)
 ^([^a][^b][^c])
which, put together, gives the regex
 ^(ab[^c]|a[^b]c|a[^b][^c]|[^a]bc|[^a]b[^c]|[^a][^b]c|[^a][^b][^c])
Note that this won't work to detect the word "abc" anywhere in a string. You need to have some way of anchoring the inverse word match
like: ^(a[^b]|[^a]b|[^a][^b])   ;"ab" not at begining of line
 or: (a[^b]|[^a]b|[^a][^b])&   ;"ab" not at end of line
 or: 123(a[^b]|[^a]b|[^a][^b]) ;"ab" not after "123"
I don't know why "(abc){0,0}" is an invalid synthax. It would've made all this much simpler.


Slightly off-topic, here's a regex date validator (format yyyy-mm-dd, remove all spaces and linefeeds):
 ^(19|20)([0-9]{2}-((0[13-9]|1[0-2])-(0[1-9]|[12][0-9]|30)|
 (0[13578]|1[02])-31|02-(0[1-9]|1[0-9]|2[0-8]))|([2468]0|
 [02468][48]|[13579][26])-02-29)$


moc dot liamtoh

In a PCRE \s matches whitespace, but not inside a character class:
preg_match ('/\s/', ' ') // match
preg_match ('/[\s]/', ' ') // no match
Within a character class [:space:] is treated as a single character that matches any single whitespace character:
$pattern = '/[[:space:]]/';
$subject = "space tab\tnewline\n";
preg_match_all($pattern, $subject, $out) // == 3
To match a hyphen from within a character class, it must either be first or last; otherwise, it will act as a range operator.
Example: To match a blank string or a string containing only uppercase letters, underscores, spaces, and hyphens:
preg_match('/^[A-Z_ -]*$/', $subject)
To match any whitespace, not just spaces:
preg_match('/^[A-Z_[:space:]-]*$/', $subject)


franck569

if you want to exclude a WORD, use this :
[^[WORD]]{0}
@++, Franck569.


03-feb-2002 03:02

if you are looking for the abbreviations like tab, carriage return, regex-class definitions  
you should look here:
http://elvin.dstc.edu.au/doc/regex.html
some excerpts:
\a control characters bell
\b backspace
\f form feed
\n line feed
\r carriage return
\t horizontal tab
\v vertical tab
class example
\cLu all uppercase letters


trucex

I was having a ton of issues with other people's phone number validation expressions, so I made my own. It works with most US phone numbers, including those with extentions. Format matches any of the following formats:
5551234567
555 1234567
555 123 4567
555 123-4567
555-1234567
555-123-4567
555123-4567
(555)1234567
(555)123 4567
(555)123-4567
(555) 1234567
(555) 123-4567
(555) 123 4567
And any of the following extentions can be added with or without a space between them and the number:
x123
x.123
x. 123
x 123
ext.123
ext. 123
ext 123
ext123
Extentions support between 1 and 5 digits.
Here is the expression:
$regex = '^[(]?[2-9]{1}[0-9]{2}[) -]{0,2}' . '[0-9]{3}[- ]?' . '[0-9]{4}[ ]?' . '((x|ext)[.]?[ ]?[0-9]{1,5})?$';
Enjoy!


mina86

I tested how fast POSIX and Perl regular expresions are, and here are the results:
          | POSIX Extended  | Perl-Compatible |   POSIX - Perl
-----------+-----------------+-----------------+-----------------
    match |    0.1296420097 |    0.1006720066 |  0.0289700031
  match i |    0.1204010248 |    0.1101620197 |  0.0102390051
  replace |    0.1896649599 |    0.1298999786 |  0.0597649813
replace i |   10.6998120546 |    0.1453789473 | 10.5544331074
So, as you can see, preg_* functions are faster then ereg* functions. You can find source code of my test script here: http://mina86.home.staszic.waw.pl/temp/regexp-speed-test.txt


paper

I have also experienced the same problem as bps7j@yahoo.com had been experiencing, except I did not recognize the problem until after many hours of debugging.
"\s" does not seem to represent spaces, however "[[:space:]]" does.
Another problem I was having was matching dashes/hyphens '-'. You must escape them "\-" and place them at the end of a bracket expression.
Example: To match a blank string or a string containing only uppercase letters, underscores, spaces, and hyphens:
^([A-Z_\-]|[[:space:]])*$
Hope this saves someone some time from debugging like I was. :)


nate -at- theklaibers -dot- com

I am using a regex with the same thought process in mind as the earlier phone number. However, I have also implemented it to allow the '1' so a number like.
1 222 222 2222 would still be valid as well (along with all of the other combinations.
In my regex, I pull out the matches - not the exact string. So if someone were to forget a bracket, it wouldnt matter to the actual output as it is stripped from that match.
So, if you put in 222) 233 3454, the matches would only pull out 1=>222, 2=>233, 3=>3454
This has been very helpful in tweaking my regex.
Thanks,
Nate


nothing

His regular expression is correct, the ^ is to check for the beginning of the string. It is just looking for delimiter characters, try putting slashes around it.
"/<regex>/"


stringer

Hey trucex. Cool phone number function but your $regex produces the following error. Warning: No ending delimiter '^' found
Instead of:
$regex = '^[(]?[2-9]{1}[0-9]{2}[) -]{0,2}' . '[0-9]{3}[- ]?' . '[0-9]{4}[ ]?' . '((x|ext)[.]?[ ]?[0-9]{1,5})?$';
It think should be:
$regex = '^[(]?[2-9]{1}[0-9]{2}[) -]{0,2}' . '[0-9]{3}[- ]?' . '[0-9]{4}[ ]?' . '((x|ext)[.]?[ ]?[0-9]{1,5})?$^';


regex

Follow-up to my previous post:
Some simple optimization allowed me to realize that excluding a word at the beginning of a string has a degree of complexity O(n) rather than O(n^2). I only had to follow the logic:
if str[0] != badword[0] then OK
else
 if str[1] != badword[1] then OK
 else
   if str[2] != badword[2] then OK
   else ...
So excluding the word 'abc' at the beginning of a string is much more simple than I had made it out to be:
 ^([^a]|a[^b]|ab[^c])


robin

Ever wondered how to exclude "[" and "]"?
Here it goes: "[^][]". Extra characters to exclude can beadded right in the middle like this: "[^]fobar[]".


bart

Dario seems to have made a nice tutorial about regular expressions:
http://www.phpbuilder.com/columns/dario19990616.php3
Thanks Dario! ...


annie

Another nice tuturial about regular expressions: http://www.mkssoftware.com/docs/man5/regexp.5.asp

ajd

A minor tweak to trucex' phone validator, because some people use a dot separator between the area code, exchange and four-digit block.
Posted here for your copy-and-paste convenience.
$regex = '^[(]?[2-9]{1}[0-9]{2}[) -.]{0,2}' . '[0-9]{3}[- .]?' . '[0-9]{4}[ ]?' . '((x|ext)[.]?[ ]?[0-9]{1,5})?$';


goran

@regex - off-topic date validator yyyy-mm-dd (mySQL date format)
try validating 2008-01-31 and it will fail.
Perhaps using core PHP functions and simple solutions could be better approach:
function validateDate($date)
{
 if (preg_match('/^(\d{4})-(\d{2})-(\d{2})$/', $date, $datebit)) :
   if (true === checkdate($datebit[2] , $datebit[3] , $datebit[1])) :
     return $date;
   endif;
 else :
   return false;
 endif;
}


Change Language


Follow Navioo On Twitter
.NET Functions
Apache-specific Functions
Alternative PHP Cache
Advanced PHP debugger
Array Functions
Aspell functions [deprecated]
BBCode Functions
BCMath Arbitrary Precision Mathematics Functions
PHP bytecode Compiler
Bzip2 Compression Functions
Calendar Functions
CCVS API Functions [deprecated]
Class/Object Functions
Classkit Functions
ClibPDF Functions [deprecated]
COM and .Net (Windows)
Crack Functions
Character Type Functions
CURL
Cybercash Payment Functions
Credit Mutuel CyberMUT functions
Cyrus IMAP administration Functions
Date and Time Functions
DB++ Functions
Database (dbm-style) Abstraction Layer Functions
dBase Functions
DBM Functions [deprecated]
dbx Functions
Direct IO Functions
Directory Functions
DOM Functions
DOM XML Functions
enchant Functions
Error Handling and Logging Functions
Exif Functions
Expect Functions
File Alteration Monitor Functions
Forms Data Format Functions
Fileinfo Functions
filePro Functions
Filesystem Functions
Filter Functions
Firebird/InterBase Functions
Firebird/Interbase Functions (PDO_FIREBIRD)
FriBiDi Functions
FrontBase Functions
FTP Functions
Function Handling Functions
GeoIP Functions
Gettext Functions
GMP Functions
gnupg Functions
Net_Gopher
Haru PDF Functions
hash Functions
HTTP
Hyperwave Functions
Hyperwave API Functions
i18n Functions
IBM Functions (PDO_IBM)
IBM DB2
iconv Functions
ID3 Functions
IIS Administration Functions
Image Functions
Imagick Image Library
IMAP
Informix Functions
Informix Functions (PDO_INFORMIX)
Ingres II Functions
IRC Gateway Functions
PHP / Java Integration
JSON Functions
KADM5
LDAP Functions
libxml Functions
Lotus Notes Functions
LZF Functions
Mail Functions
Mailparse Functions
Mathematical Functions
MaxDB PHP Extension
MCAL Functions
Mcrypt Encryption Functions
MCVE (Monetra) Payment Functions
Memcache Functions
Mhash Functions
Mimetype Functions
Ming functions for Flash
Miscellaneous Functions
mnoGoSearch Functions
Microsoft SQL Server Functions
Microsoft SQL Server and Sybase Functions (PDO_DBLIB)
Mohawk Software Session Handler Functions
mSQL Functions
Multibyte String Functions
muscat Functions
MySQL Functions
MySQL Functions (PDO_MYSQL)
MySQL Improved Extension
Ncurses Terminal Screen Control Functions
Network Functions
Newt Functions
NSAPI-specific Functions
Object Aggregation/Composition Functions
Object property and method call overloading
Oracle Functions
ODBC Functions (Unified)
ODBC and DB2 Functions (PDO_ODBC)
oggvorbis
OpenAL Audio Bindings
OpenSSL Functions
Oracle Functions [deprecated]
Oracle Functions (PDO_OCI)
Output Control Functions
Ovrimos SQL Functions
Paradox File Access
Parsekit Functions
Process Control Functions
Regular Expression Functions (Perl-Compatible)
PDF Functions
PDO Functions
Phar archive stream and classes
PHP Options&Information
POSIX Functions
Regular Expression Functions (POSIX Extended)
PostgreSQL Functions
PostgreSQL Functions (PDO_PGSQL)
Printer Functions
Program Execution Functions
PostScript document creation
Pspell Functions
qtdom Functions
Radius
Rar Functions
GNU Readline
GNU Recode Functions
RPM Header Reading Functions
runkit Functions
SAM - Simple Asynchronous Messaging
Satellite CORBA client extension [deprecated]
SCA Functions
SDO Functions
SDO XML Data Access Service Functions
SDO Relational Data Access Service Functions
Semaphore
SESAM Database Functions
PostgreSQL Session Save Handler
Session Handling Functions
Shared Memory Functions
SimpleXML functions
SNMP Functions
SOAP Functions
Socket Functions
Standard PHP Library (SPL) Functions
SQLite Functions
SQLite Functions (PDO_SQLITE)
Secure Shell2 Functions
Statistics Functions
Stream Functions
String Functions
Subversion Functions
Shockwave Flash Functions
Swish Functions
Sybase Functions
TCP Wrappers Functions
Tidy Functions
Tokenizer Functions
Unicode Functions
URL Functions
Variable Handling Functions
Verisign Payflow Pro Functions
vpopmail Functions
W32api Functions
WDDX Functions
win32ps Functions
win32service Functions
xattr Functions
xdiff Functions
XML Parser Functions
XML-RPC Functions
XMLReader functions
XMLWriter Functions
XSL functions
XSLT Functions
YAZ Functions
YP/NIS Functions
Zip File Functions
Zlib Compression Functions
eXTReMe Tracker