|
Regular Expression Functions (Perl-Compatible)The syntax for patterns used in these functions closely resembles Perl. The expression should be enclosed in the delimiters, a forward slash (/), for example. Any character can be used for delimiter as long as it's not alphanumeric or backslash (\). If the delimiter character has to be used in the expression itself, it needs to be escaped by backslash. Since PHP 4.0.4, you can also use Perl-style (), {}, [], and <> matching delimiters. See Pattern Syntax for detailed explanation. The ending delimiter may be followed by various modifiers that affect the matching. See Pattern Modifiers. PHP also supports regular expressions using a POSIX-extended syntax using the POSIX-extended regex functions.
Note:
This extension maintains a global per-thread cache of compiled regular expressions (up to 4096).
Warning:
You should be aware of some limitations of PCRE. Read » http://www.pcre.org/pcre.txt for more info.
Beginning with PHP 4.2.0 these functions are enabled by default. You can
disable the pcre functions with
The windows version of PHP has built in support for this extension. You do not need to load any additional extension in order to use these functions.
The behaviour of these functions is affected by settings in Table 243. PCRE Configuration Options
Here's a short explanation of the configuration directives.
The constants below are defined by this extension, and will only be available when the extension has either been compiled into PHP or dynamically loaded at runtime. Table 244. PREG constants
Example 1714. Examples of invalid patterns
Table of Contents
Code Examples / Notes » ref.pcrerichardh
There's a printable PDF PCRE cheat sheet available here: http://www.phpguru.org/article.php?ne_id=67 Has the common metacharacters, quantifiers, pattern modifiers, character classes and assertions with short explanations. steve
Something to bear in mind is that regex is actually a declarative programming language like prolog : your regex is a set of rules which the regex interpreter tries to match against a string. During this matching, the interpreter will assume certain things, and continue assuming them until it comes up against a failure to match, which then causes it to backtrack. Regex assumes "greedy matching" unless explicitly told not to, which can cause a lot of backtracking. A general rule of thumb is that the more backtracking, the slower the matching process. It is therefore vital, if you are trying to optimise your program to run quickly (and if you can't do without regex), to optimise your regexes to match quickly. I recommend the use of a tool such as "The Regex Coach" to debug your regex strings. http://weitz.de/files/regex-coach.exe (Windows installer) http://weitz.de/files/regex-coach.tgz (Linux tar archive) nickspring
Regular Expressions Tutorial on russian language is accessible on http://www.pcre.ru
biju
Regular Expressions Tutorial from non PHP sites http://www.amk.ca/python/howto/regex/ http://sitescooper.org/tao_regexps.html http://www.english.uga.edu/humcomp/perl/regex2a.html http://www.english.uga.edu/humcomp/perl/regexps.html http://www.english.uga.edu/humcomp/perl/regular_expressions.HTML http://www.english.uga.edu/humcomp/perl/ http://java.sun.com/docs/books/tutorial/extra/regex/ http://gnosis.cx/publish/programming/regular_expressions.html http://www.zvon.org/other/PerlTutorial/Books/Book1/ http://it.metr.ou.edu/regex/ http://www.regular-expressions.info/ misc
PCRE faster than POSIX RE? Not always. In a recent search-engine project here at Cynergi, I had a simple loop with a few cute ereg_replace() functions that took 3min to process data. I changed that 10-line loop into a 100-line hand-written code for replacement and the loop now took 10s to process the same data! This opened my eye to what can *IN SOME CASES* be very slow regular expressions. Lately I decided to look into Perl-compatible regular expressions (PCRE). Most pages claim PCRE are faster than POSIX, but a few claim otherwise. I decided on bechmarks of my own. My first few tests confirmed PCRE to be faster, but... the results were slightly different than others were getting, so I decided to benchmark every case of RE usage I had on a 8000-line secure (and fast) Webmail project here at Cynergi to check it out. The results? Inconclusive! Sometimes PCRE *are* faster (sometimes by a factor greater than 100x faster!), but some other times POSIX RE are faster (by a factor of 2x). I still have to find a rule on when are one or the other faster. It's not only about search data size, amount of data matched, or "RE compilation time" which would show when you repeated the function often: one would *always* be faster than the other. But I didn't find a pattern here. But truth be said, I also didn't take the time to look into the source code and analyse the problem. I can give you some examples, though. The POSIX RE ([0-9]{4})/([0-9]{2})/([0-9]{2})[^0-9]+ ([0-9]{2}):([0-9]{2}):([0-9]{2}) is 30% faster in POSIX than when converted to PCRE (even if you use \d and \D and non-greedy matching). On the other hand, a similarly PCRE complex pattern /[0-9]{1,2}[ \t]+[a-zA-Z]{3}[ \t]+[0-9]{4}[ \t]+[0-9]{1,2}:[0-9]{1,2}(:[0-9]{1,2})?[ \t]+[+-][0-9]{4}/ is 2.5x faster in PCRE than in POSIX RE. Simple replacement patterns like ereg_replace( "[^a-zA-Z0-9-]+", "", $m ); are 2x faster in POSIX RE than PCRE. And then we get confused again because a POSIX RE pattern like (^|\n|\r)begin-base64[ \t]+[0-7]{3,4}[ \t]+...... is 2x faster as POSIX RE, but the case-insensitive PCRE /^Received[ \t]*:[ \t]*by[ \t]+([^ \t]+)[ \t]/i is 30x faster than its POSIX RE version! When it comes to case sensitivity, PCRE has so far seemed to be the best option. But I found some really strange behaviour from ereg/eregi. On a very simple POSIX RE (^|\r|\n)mime-version[ \t]*: I found eregi() taking 3.60s (just a number in a test benchmark), while the corresponding PCRE took 0.16s! But if I used ereg() (case-sensitive) the POSIX RE time went down to 0.08s! So I investigated further. I tried to make the POSIX RE case-insensitive itself. I got as far as this: (^|\r|\n)[mM][iI][mM][eE]-vers[iI][oO][nN][ \t]*: This version also took 0.08s. But if I try to apply the same rule to any of the 'v', 'e', 'r' or 's' letters that are not changed, the time is back to the 3.60s mark, and not gradually, but immediatelly so! The test data didn't have any "vers" in it, other "mime" words in it or any "ion" that might be confusing the POSIX parser, so I'm at a loss. Bottom line: always benchmark your PCRE / POSIX RE to find the fastest! Tests were performed with PHP 5.1.2 under Windows, from the command line. Pedro Freire cynergi.com stronk7
One comment about 5.2.x and the pcre.backtrack_limit: Note that this setting wasn't present under previous PHP releases and the behaviour (or limit) under those releases was, in practise, higher so all these PCRE functions were able to "capture" longer strings. With the arrival of the setting, defaulting to 100000 (less than 100K), you won't be able to match/capture strings over that size using, for example "ungreedy" modifiers. So, in a lot of situations, you'll need to raise that (very small IMO) limit. The worst part is that PHP simply won't match/capture those strings over pcre.backtrack_limit and will it be 100% silent about that (I think that throwing some NOTICE/WARNING if raised could help a lot to developers). There is a lot of people suffering this changed behaviour from I've read on forums, bugs and so on). Hope this note helps, ciao :-) hrz
If you're venturing into new regular expression territory with a lack of useful examples then it would pay to get familiar with this page: http://www.pcre.org/man.txt ned baldessin
If you want to perform regular expressions on Unicode strings, the PCRE functions will NOT be of any help. You need to use the Multibyte extension : mb_ereg(), mb_eregi(), pb_ereg_replace() and so on. When doing so, be carefull to set the default text encoding to the same encoding used by the text you are searching and replacing in. You can do that with the mb_regex_encoding() function. You will probably also want to set the default encoding for the other mb_* string functions with mb_internal_encoding(). So when dealing with, say, french text, I start with these : <?php mb_internal_encoding('UTF-8'); mb_regex_encoding('UTF-8'); setlocale(LC_ALL, 'fr-fr'); ?> lgandras
I read this part, but i couldn't undertand a single word beacause before i must know Basic regular expression. Somebody put a link for PERL that is almost like PHP but here is one totally dedicated to PHP: http://weblogtoolscollection.com/regex/regex.php gokul
I came accross this nice tutorial for regural expression in perl http://perldoc.perl.org/perlretut.html tabac
Hello bermi <?php if(preg_match("/((a+)?)+/", "a")){ echo "Matched"; } ?> Segfault is always bad, but realize what you are asking here: "Is there one or more occurrences of zero or one sequences of one or more 'a' ?" Considering the backtracking algorithm used, the RE engine must consider if an infinite sequence of sub matches of which all but one has a length of zero. This is a bug, but it is in line with the famous "ls -l /usr/../*/../*/../*/../*/../*" bug hfuecks
Good PCRE tutorial at http://www.tote-taste.de/X-Project/regex/ - well explained but still in depth
|
Change Language.NET Functions Apache-specific Functions Alternative PHP Cache Advanced PHP debugger Array Functions Aspell functions [deprecated] BBCode Functions BCMath Arbitrary Precision Mathematics Functions PHP bytecode Compiler Bzip2 Compression Functions Calendar Functions CCVS API Functions [deprecated] Class/Object Functions Classkit Functions ClibPDF Functions [deprecated] COM and .Net (Windows) Crack Functions Character Type Functions CURL Cybercash Payment Functions Credit Mutuel CyberMUT functions Cyrus IMAP administration Functions Date and Time Functions DB++ Functions Database (dbm-style) Abstraction Layer Functions dBase Functions DBM Functions [deprecated] dbx Functions Direct IO Functions Directory Functions DOM Functions DOM XML Functions enchant Functions Error Handling and Logging Functions Exif Functions Expect Functions File Alteration Monitor Functions Forms Data Format Functions Fileinfo Functions filePro Functions Filesystem Functions Filter Functions Firebird/InterBase Functions Firebird/Interbase Functions (PDO_FIREBIRD) FriBiDi Functions FrontBase Functions FTP Functions Function Handling Functions GeoIP Functions Gettext Functions GMP Functions gnupg Functions Net_Gopher Haru PDF Functions hash Functions HTTP Hyperwave Functions Hyperwave API Functions i18n Functions IBM Functions (PDO_IBM) IBM DB2 iconv Functions ID3 Functions IIS Administration Functions Image Functions Imagick Image Library IMAP Informix Functions Informix Functions (PDO_INFORMIX) Ingres II Functions IRC Gateway Functions PHP / Java Integration JSON Functions KADM5 LDAP Functions libxml Functions Lotus Notes Functions LZF Functions Mail Functions Mailparse Functions Mathematical Functions MaxDB PHP Extension MCAL Functions Mcrypt Encryption Functions MCVE (Monetra) Payment Functions Memcache Functions Mhash Functions Mimetype Functions Ming functions for Flash Miscellaneous Functions mnoGoSearch Functions Microsoft SQL Server Functions Microsoft SQL Server and Sybase Functions (PDO_DBLIB) Mohawk Software Session Handler Functions mSQL Functions Multibyte String Functions muscat Functions MySQL Functions MySQL Functions (PDO_MYSQL) MySQL Improved Extension Ncurses Terminal Screen Control Functions Network Functions Newt Functions NSAPI-specific Functions Object Aggregation/Composition Functions Object property and method call overloading Oracle Functions ODBC Functions (Unified) ODBC and DB2 Functions (PDO_ODBC) oggvorbis OpenAL Audio Bindings OpenSSL Functions Oracle Functions [deprecated] Oracle Functions (PDO_OCI) Output Control Functions Ovrimos SQL Functions Paradox File Access Parsekit Functions Process Control Functions Regular Expression Functions (Perl-Compatible) PDF Functions PDO Functions Phar archive stream and classes PHP Options&Information POSIX Functions Regular Expression Functions (POSIX Extended) PostgreSQL Functions PostgreSQL Functions (PDO_PGSQL) Printer Functions Program Execution Functions PostScript document creation Pspell Functions qtdom Functions Radius Rar Functions GNU Readline GNU Recode Functions RPM Header Reading Functions runkit Functions SAM - Simple Asynchronous Messaging Satellite CORBA client extension [deprecated] SCA Functions SDO Functions SDO XML Data Access Service Functions SDO Relational Data Access Service Functions Semaphore SESAM Database Functions PostgreSQL Session Save Handler Session Handling Functions Shared Memory Functions SimpleXML functions SNMP Functions SOAP Functions Socket Functions Standard PHP Library (SPL) Functions SQLite Functions SQLite Functions (PDO_SQLITE) Secure Shell2 Functions Statistics Functions Stream Functions String Functions Subversion Functions Shockwave Flash Functions Swish Functions Sybase Functions TCP Wrappers Functions Tidy Functions Tokenizer Functions Unicode Functions URL Functions Variable Handling Functions Verisign Payflow Pro Functions vpopmail Functions W32api Functions WDDX Functions win32ps Functions win32service Functions xattr Functions xdiff Functions XML Parser Functions XML-RPC Functions XMLReader functions XMLWriter Functions XSL functions XSLT Functions YAZ Functions YP/NIS Functions Zip File Functions Zlib Compression Functions |