|
Tidy FunctionsTidy is a binding for the Tidy HTML clean and repair utility which allows you to not only clean and otherwise manipulate HTML documents, but also traverse the document tree. To use Tidy, you will need libtidy installed, available on the tidy homepage » http://tidy.sourceforge.net/. Tidy is currently available for PHP 4.3.x and PHP 5 as a PECL extension from » http://pecl.php.net/package/tidy.
Note:
Tidy 1.0 is just for PHP 4.3.x, while Tidy 2.0 is just for PHP 5. If » PEAR is available on your *nix-like system you can use the pear installer to install the tidy extension, by the following command: pecl install tidy. You can always download the tar.gz package and install tidy by hand: Example 2532. tidy install by hand in PHP 4.3.xgunzip tidy-xxx.tgz Windows users can download the extension dll from » http://pecl4win.php.net/ext.php/php_tidy.dll.
In PHP 5 you need only to compile using the
The behaviour of these functions is affected by settings in Table 321. Tidy Configuration Options
Here's a short explanation of the configuration directives.
Note:
The properties marked with * are just available since PHP 5.1.0. The constants below are defined by this extension, and will only be available when the extension has either been compiled into PHP or dynamically loaded at runtime.
Each The following constants are defined: Table 322. tidy tag constants
Table 323. tidy attribute constants
Table 324. tidy nodetype constants
This simple example shows basic Tidy usage. Example 2533. Basic Tidy usage<?php Table of Contents
Code Examples / Notes » ref.tidyshuster
Valid XHTML STRICT <?php if (function_exists('tidy_repair_string')) { $xhtml = tidy_repair_string($xhtml, array('output-xhtml' => true, 'show-body-only' => true, 'doctype' => 'strict', 'drop-font-tags' => true, 'drop-proprietary-attributes' => true, 'lower-literals' => true, 'quote-ampersand' => true, 'wrap' => 0), 'raw'); } ?> tonygambone
Using PHP 5.1.2 on Win32/IIS, I noticed that even with "output-xhtml: yes," tidy was adding the deprecated name attribute to form tags (using the value of the id attribute). Grabbing the latest dll from the snaps link at the top of the page fixed this.
mohan
To those who need to install libtidy on mac os x , here is a guide that worked for me : If you're on Mac OS X, you'll need to tell the Makefile that you use ranlib: $ export set RANLIB=ranlib Change to the directory with the Makefile in it, and run make. This example uses the GNU make Makefile. $ cd tidy/build/gmake/ $ make if [ ! -d ./obj ]; then mkdir ./obj; fi gcc -o obj/access.o ... ... etc etc etc ... Install the libs, headers and the tidy executable: $ sudo make install If you're on Mac OS X, you'll have to run ranlib again on the installed lib: $ sudo ranlib /usr/local/lib/libtidy.a guillaume
To install correctly Tidy for PHP5 on Ubuntu, follow this link : http://ubuntuforums.org/showthread.php?t=195636 In fact, you need to run a "make clean" before the commands "make" and "make install" paul cook
To get libtidy and PHP 5.0.5 compiled on OS X Tiger this is what I needed to do: 1) download and upack the tidy source. 2) cd tidy-source-dir 3) >> /bin/sh build/gnuauto/setup.sh 4) then you can configure/make/make install as normal PHP build generates errors because of tidy so I needed to edit the platform.h file like this (use your favorite command line editor): 5) >> sudo emacs /usr/local/include/platform.h 6) comment out line 508 which was causing the 'duplicate "unsigned" ' error in the PHP build. 7) configure/make/make install PHP as normal using --with-tidy=/usr/local Restart apache and everything works now. HTH someone. 19-feb-2005 11:47
There is a HTML/XHTML validator based on tidy at http://validator.aborla.net/ It is released under LGPL. jon dowland bugs
Rough installation instructions for debian/testing: Use debian's apt package manager to install the required development packages $ apt-get install php4-dev php4-pear libtidy-dev Then use pear to install tidy $ pear install tidy Note: I did /not/ have success installing the tarball locally. Only using this method was the .so put in the correct place. I also had to add an entry to the php.ini $ echo extension=tidy.so >> /etc/php4/apache/php.ini $ apachectl restart ...and you're done. 13-jan-2005 07:20
Just in case anyone else has been having problems using the tidy extension in *PHP4 v4.3.10. Here is a working example: $html = '<HTML><HEAD></HEAD><BODY>Hello World</BODY></HTML>'; $config = array('indent'=> TRUE, 'output-xhtml' => TRUE, 'wrap' => 80); tidy_set_encoding('UTF8'); foreach ($config as $key => $value) { tidy_setopt($key,$value); } tidy_parse_string($html); tidy_clean_repair(); echo tidy_get_output(); Resultant HTML should be similar to: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title></title> </head> <body> Hello World </body> </html> tom
It should be noted that the examples on this page apply ONLY to PHP5. None of the functions in the manual apply to PHP4. The names are the same but arguments are different on some of them (tidy_parse_string). If you wish to use tidy in PHP 4.3.x you can use the following example instead: <?php $tidyhtml = ob_get_contents(); if( function_exists( 'tidy_parse_string' ) ) { tidy_set_encoding('iso-8859-1'); tidy_parse_string($tidyhtml); tidy_setopt('output-xhtml', TRUE); tidy_setopt('indent', TRUE); tidy_setopt('indent-spaces', 2); tidy_setopt('wrap', 200); tidy_clean_repair(); $tidyhtml = tidy_get_output(); } ob_end_clean(); echo $tidyhtml; ?> Hope that helps somebody. bill dot mccuistion
Installing tidy on Fedora Core 2 required three libraries: tidy... tidy-devel... libtidy... All of which I found at http://rpm.pbone.net Then, finally, could "./configure --with-tidy" Hope this helps someone out. This was "REALLY" hard (for me) to figure out as no where else was clearly documented. doodleelephant
I'm installing PHP 5.0.2 on Redhat Linux (I forget the version. Enterprise WS 3 I think) I had troubles installing the libtidy. It consistently complained that it could not find 'libtidy'. I finally got a clue into how to install it (in build/gnuauto/readme.txt). This is how I finally got it to install (after lots of trial and error): First, don't get the binary distribution of of tidy.sf.net. It's not what you want. You need the source distribution. Command by command this is what I did: ======= wget http://tidy.sourceforge.net/src/tidy_src.tgz tar -xzf tidy_src.tgz cd tidy /bin/sh build/gnuauto/setup.sh ./configure --prefix=/usr make make install cd [php source directory] ./configure --with-tidy=/usr --[other extensions] make make install ======= Tada. Finally it doesn't complain when I configure PHP about the installation. The info I needed was stuck in that build/gnuauto/readme.txt file in the tidy directory. Took me a while. Hope my trials can help others save time. Doodleelephant info att tcknetwork doot com
I have been searching for an easy way to check an entire website against HTML/XHTML formatting (no error, compilant, etc.), tidy is very useful for that : <?php /** aready checked pages */ $e=array(); /** webpages to check */ $t=array("/web/test.com/"); /** forbidden extensions (typically linked ressources) */ $x=explode(",","jpg,gif,png,doc,xls,pdf"); echo "<pre>"; while ($t[0]) { // already checked or a ressource => skip if (in_array($t[0],$e) || in_array(substr($t[0],-3),$x)) array_shift($t); else { $c=array_shift($t); $e[]=$c; $t=array_merge($t,ck($c)); } } echo "</pre>"; /** check_vailidty($url,$server) return : list of the internal links of the page */ function ck($u,$s="http://127.0.0.1") { $c=array("indent"=>1,"output-xhtml"=>1,"accessibility-check"=>3); $t=tidy_parse_string(file_get_contents($s.$u),$c); tidy_clean_repair($t); if (tidy_error_count($t)) { // we have error, display them echo "FAIL ".htmlentities($u)." (".tidy_error_count($t)." errors)\n"; echo htmlentities(tidy_get_error_buffer($t))."\n"; } else { // all right echo "OK ".htmlentities($u)."\n"; } // return all the links inside the page return gl(tidy_get_root($t),substr($u,-1)=="/"?$u:dirname($u)."/"); } /** get_links($tinynode,$baseurl) return : list of the links */ function gl($t,$b) { $r=array(); $c=count($t->child); for ($i=0;$i<$c;$i++) { $e=&$t->child[$i]; if ($e->name=="a") { // a link $h=$e->attribute["href"]; // url if (substr($h,0,4)!="http") { // prevent external links $r[]=sp(substr($h,0,1)=="/"?$h:$b.$h); } } else { // not a link, search recursively inside $r=array_merge($r,gl($e,$b)); } } return $r; } /** simplify_path($path) return : simplified path */ function sp($p) { while ($o!=$p) { $o=$p; $p=str_replace(array("//","/./"),"/",$p); $p=preg_replace("/\/[^\/]+\/..\//","/",$p); } return $p; } ?> Limitation : does not detect javascript-generated links. Check about set_time_limit(0) if you have a lot of webpages. matteo dot contri
i had many problem with a javascript that grab mouse event on image and tidy (obviously). I found this solution: 'output-xhtml' => false and everything is working again! patatraboum
<?php // //The tidy tree of your favorite ! //For PHP 5 (CGI) //Thanks to john@php.net // $file="http://www.php.net"; // $cns=get_defined_constants(true); $tidyCns=array("tags"=>array(),"types"=>array()); foreach($cns["tidy"] as $cKey=>$cVal){ if($cPos=strpos($cKey,$cStr="TAG")) $tidyCns["tags"][$cVal]="$cStr : ".substr($cKey,$cPos+strlen($cStr)+1); elseif($cPos=strpos($cKey,$cStr="TYPE")) $tidyCns["types"][$cVal]="$cStr : ".substr($cKey,$cPos+strlen($cStr)+1); } $tidyNext=array(); // echo "<html><head><meta http-equiv='Content-Type' content='text/html; charset=windows-1252'><title>Tidy Tree :: $file</title></head>"; echo "<body><pre>"; // tidyTree(tidy_get_root(tidy_parse_file($file)),0); // function tidyTree($tidy,$level){ global $tidyCns,$tidyNext; $tidyTab=array(); $tidyKeys=array("type","value","id","attribute"); foreach($tidy as $pKey=>$pVal){ if(in_array($pKey,$tidyKeys)) $tidyTab[array_search($pKey,$tidyKeys)]=$pVal; } ksort($tidyTab); foreach($tidyTab as $pKey=>$pVal){ switch($pKey){ case 0 : if($pVal==4) $value=true; else $value=false; echo indent(true,$level).$tidyCns["types"][$pVal]."\n"; break; case 1 : if($value){ echo indent(false,$level)."VALEUR : ".str_replace("\n","\n".indent(false,$level),$pVal)."\n"; } break; case 2 : echo indent(false,$level).$tidyCns["tags"][$pVal]."\n"; break; case 3 : if($pVal!=NULL){ echo indent(false,$level)."ATTRIBUTS : "; foreach ($pVal as $aKey=>$aVal) echo "$aKey=$aVal "; echo "\n"; } } } if($tidy->hasChildren()){ $level++; $i=0; $tidyNext[$level]=true; echo indent(false,$level)."\n"; foreach($tidy->child as $child){ $i++; if($i==count($tidy->child)) $tidyNext[$level]=false; tidyTree($child,$level); } } else echo indent(false,$level)."\n"; } // function indent($tidyType,$level){ global $tidyNext; $indent=""; for($i=1;$i<=$level;$i++){ if($i<$level||!$tidyType){ if($tidyNext[$i]) $str="| "; else $str=" "; } else $str="+--"; $indent=$indent.$str; } return $indent; } // echo "</pre></body></html>"; // ?> |
Change Language.NET Functions Apache-specific Functions Alternative PHP Cache Advanced PHP debugger Array Functions Aspell functions [deprecated] BBCode Functions BCMath Arbitrary Precision Mathematics Functions PHP bytecode Compiler Bzip2 Compression Functions Calendar Functions CCVS API Functions [deprecated] Class/Object Functions Classkit Functions ClibPDF Functions [deprecated] COM and .Net (Windows) Crack Functions Character Type Functions CURL Cybercash Payment Functions Credit Mutuel CyberMUT functions Cyrus IMAP administration Functions Date and Time Functions DB++ Functions Database (dbm-style) Abstraction Layer Functions dBase Functions DBM Functions [deprecated] dbx Functions Direct IO Functions Directory Functions DOM Functions DOM XML Functions enchant Functions Error Handling and Logging Functions Exif Functions Expect Functions File Alteration Monitor Functions Forms Data Format Functions Fileinfo Functions filePro Functions Filesystem Functions Filter Functions Firebird/InterBase Functions Firebird/Interbase Functions (PDO_FIREBIRD) FriBiDi Functions FrontBase Functions FTP Functions Function Handling Functions GeoIP Functions Gettext Functions GMP Functions gnupg Functions Net_Gopher Haru PDF Functions hash Functions HTTP Hyperwave Functions Hyperwave API Functions i18n Functions IBM Functions (PDO_IBM) IBM DB2 iconv Functions ID3 Functions IIS Administration Functions Image Functions Imagick Image Library IMAP Informix Functions Informix Functions (PDO_INFORMIX) Ingres II Functions IRC Gateway Functions PHP / Java Integration JSON Functions KADM5 LDAP Functions libxml Functions Lotus Notes Functions LZF Functions Mail Functions Mailparse Functions Mathematical Functions MaxDB PHP Extension MCAL Functions Mcrypt Encryption Functions MCVE (Monetra) Payment Functions Memcache Functions Mhash Functions Mimetype Functions Ming functions for Flash Miscellaneous Functions mnoGoSearch Functions Microsoft SQL Server Functions Microsoft SQL Server and Sybase Functions (PDO_DBLIB) Mohawk Software Session Handler Functions mSQL Functions Multibyte String Functions muscat Functions MySQL Functions MySQL Functions (PDO_MYSQL) MySQL Improved Extension Ncurses Terminal Screen Control Functions Network Functions Newt Functions NSAPI-specific Functions Object Aggregation/Composition Functions Object property and method call overloading Oracle Functions ODBC Functions (Unified) ODBC and DB2 Functions (PDO_ODBC) oggvorbis OpenAL Audio Bindings OpenSSL Functions Oracle Functions [deprecated] Oracle Functions (PDO_OCI) Output Control Functions Ovrimos SQL Functions Paradox File Access Parsekit Functions Process Control Functions Regular Expression Functions (Perl-Compatible) PDF Functions PDO Functions Phar archive stream and classes PHP Options&Information POSIX Functions Regular Expression Functions (POSIX Extended) PostgreSQL Functions PostgreSQL Functions (PDO_PGSQL) Printer Functions Program Execution Functions PostScript document creation Pspell Functions qtdom Functions Radius Rar Functions GNU Readline GNU Recode Functions RPM Header Reading Functions runkit Functions SAM - Simple Asynchronous Messaging Satellite CORBA client extension [deprecated] SCA Functions SDO Functions SDO XML Data Access Service Functions SDO Relational Data Access Service Functions Semaphore SESAM Database Functions PostgreSQL Session Save Handler Session Handling Functions Shared Memory Functions SimpleXML functions SNMP Functions SOAP Functions Socket Functions Standard PHP Library (SPL) Functions SQLite Functions SQLite Functions (PDO_SQLITE) Secure Shell2 Functions Statistics Functions Stream Functions String Functions Subversion Functions Shockwave Flash Functions Swish Functions Sybase Functions TCP Wrappers Functions Tidy Functions Tokenizer Functions Unicode Functions URL Functions Variable Handling Functions Verisign Payflow Pro Functions vpopmail Functions W32api Functions WDDX Functions win32ps Functions win32service Functions xattr Functions xdiff Functions XML Parser Functions XML-RPC Functions XMLReader functions XMLWriter Functions XSL functions XSLT Functions YAZ Functions YP/NIS Functions Zip File Functions Zlib Compression Functions |