|
strip_tags
Strip HTML and PHP tags from a string
(PHP 4, PHP 5)
Example 2452. strip_tags() example<?php The above example will output: Test paragraph. Other text Related Examples ( Source code ) » strip_tags Examples ( Source code ) » HTML tag strip Examples ( Source code ) » strip_tags Examples ( Source code ) » stream_filter_append Code Examples / Notes » strip_tagsjoris878
[ Editor's Note: This functionality will be natively supported in a future release of PHP. Most likely 5.0 ] This routine removes all attributes from a given tag except the attributes specified in the array $attr. function stripeentag($msg,$tag,$attr) { $lengthfirst = 0; while (strstr(substr($msg,$lengthfirst),"<$tag ")!="") { $imgstart = $lengthfirst + strpos(substr($msg,$lengthfirst), "<$tag "); $partafterwith = substr($msg,$imgstart); $img = substr($partafterwith,0,strpos($partafterwith,">")+1); $img = str_replace(" =","=",$msg); $out = "<$tag"; for($i=1;$i<=count($atr);$i++) { $val = filter($img,$attr[$i]."="," "); if(strlen($val)>0) $attr[$i] = " ".$attr[$i]."=".$val; else $attr[$i] = ""; $out .= $attr[$i]; } $out .= ">"; $partafter = substr($partafterwith,strpos($partafterwith,">")+1); $msg = substr($msg,0,$imgstart).$out.$partafter; $lengthfirst = $imgstart+3; } return $msg; } jausions
To sanitize any user input, you should also consider PEAR's HTML_Safe package. http://pear.php.net/package/HTML_Safe mrmaxxx333
to rid everything in between script tags, including the script tags, i use this. <?php $description = ereg_replace("~<script[^>]*>.+</script[^>]*>~isU", "", $description); ?> it hasn't been extensively tested, but it works. also, i ran into trouble with a href tags. i wanted to strip out the url in them. i did this to turn an <a href="blah.com">welcome to blah</a> into welcome to blah (blah.com) <?php $string = preg_replace('/<a\s+.*?href="([^"]+)"[^>]*>([^<]+)<\/a>/is', '\2 (\1)', $string); ?> christianbecke
to kangaroo232002 at yahoo dot co dot uk: As far as I understand, what you report is not a bug in strip_tags(), but a bug in your HTML. You should use alt='Go >' instead of alt='Go >'. I suppose your HTML diplays allright in browsers, but that does not mean it's correct. It just shows that browsers are more graceful concerning characters not properly escaped as entities than strip_tags() is. jon780 -at- gmail.com
To eric at direnetworks dot com regarding the 1024 character limit: You could simply ltrim() the first 1024 characters, run them through strip_tags(), add them to a new string, and remove them from the first. Perform this in a loop which continued until the original string was of 0 length. bermi ferrer
This is Salaverts function improved with suggestions from this page as it has been refactored forthe Akelos Framework (http://www.akelos.org) by Jose Salavert Please note that the "u" modifier need to be lowercased. This function will also replace self-closing tags (XHTML <br /> <hr />) and will work if the text contains line breaks. <?php function strip_selected_tags($text, $tags = array()) { $args = func_get_args(); $text = array_shift($args); $tags = func_num_args() > 2 ? array_diff($args,array($text)) : (array)$tags; foreach ($tags as $tag){ if(preg_match_all('/<'.$tag.'[^>]*>((\\n|\\r|.)*)<\/'. $tag .'>/iu', $text, $found)){ $text = str_replace($found[0],$found[1],$text); } } return preg_replace('/(<('.join('|',$tags).')(\\n|\\r|.)*\/>)/iu', '', $text); } ?> birwin
This is an upgrade to the illegal characters script by robt. This script will handle the input, even if the one or all of the fileds include arrays. Of course another loop could be added to handle compound arrays within arrays, but if you are savvy enough to be using compound arrays, you don't need me to rewrite the program. <? function screenForm($ary_check_for_html) { // check array - reject if any content contains HTML. foreach($ary_check_for_html as $field_value) { if(is_array($field_value)) { foreach($field_value as $field_array) // if the field value is an array, step through it { $stripped = strip_tags($field_array); if($field_array!=$stripped) { // something in the field value was HTML return false; } } }else{ $stripped = strip_tags($field_value); if($field_value!=$stripped) { // something in the field value was HTML return false; } } } return true; } ?> tony freeman
This is a slightly altered version of tREXX's code. The difference is that this one simply removes the unwanted attributes (rather than flagging them as forbidden). function removeEvilAttributes($tagSource) { $stripAttrib = "' (style|class)=\"(.*?)\"'i"; $tagSource = stripslashes($tagSource); $tagSource = preg_replace($stripAttrib, '', $tagSource); return $tagSource; } function removeEvilTags($source) { $allowedTags='<a> <b><h1><h2><h3><h4><i>' . '<img><li><ol> <strong><table>' . '<tr><td><th><u><ul>'; $source = strip_tags($source, $allowedTags); return preg_replace('/<(.*?)>/ie', "'<'.removeEvilAttributes('\\1').'>'", $source); } $text = '<p style="Normal">Saluton el <a href="#?" class="xsarial">Esperanto-lando</a><img src="my.jpg" alt="Saluton" width=100 height=100>'; $text = removeEvilTags($text); var_dump($text); eric
the strip_tags() function in both php 4.3.8 and 5.0.2 (probably many more, but these are the only 2 versions I tested with) have a max tag length of 1024. If you're trying to process a tag over this limit, strip_tags will not return that line (as if it were an illegal tag). I noticed this problem while trying to parse a paypal encrypted link button (<input type="hidden" name="encrypted" value="encryptedtext">, with <input> as an allowed tag), which is 2702 characters long. I can't really think of any workaround for this other than parsing each tag to figure out the length, then only sending it to strip_tags() if its under 1024, but at that point, I might as well be stripping the tags myself.
computer
Thanks for the strip_selected_tags code Jose. :-) Peace, Charlie dougal
strip_tags() appears to become nauseated at the site of a <!DOCTYPE> declaration (at least in PHP 4.3.1). You might want to do something like: $html = str_replace('<!DOCTYPE','<DOCTYPE',$html); before processing with strip_tags(). chrisj
strip_tags has doesn't recognize that css within the style tags are not document text. To fix this do something similar to the following: $htmlstring = preg_replace("'<style[^>]*>.*</style>'siU",'',$htmlstring); guy
Strip tags will NOT remove HTML entities such as
isaac schlueter php
steven --at-- acko --dot-- net pointed out that you can't make strip_slashes allow comments. With this function, you can. Just pass <!--> as one of the allowed tags. Easy as pie: just pull them out, strip, and then put them back. <?php function strip_tags_c($string, $allowed_tags = '') { $allow_comments = ( strpos($allowed_tags, '<!-->') !== false ); if( $allow_comments ) { $string = str_replace(array('<!--', '-->'), array('<!--', '-->'), $string); $allowed_tags = str_replace('<!-->', '', $allowed_tags); } $string = strip_tags( $string, $allowed_tags ); if( $allow_comments ) $string = str_replace(array('<!--', '-->'), array('<!--', '-->'), $string); return $string; } ?> anonymous
Someone can use attributes like CSS in the tags. Example, you strip all tagw except <b> then a user can still do <b style="color: red; font-size: 45pt">Hello</b> which might be undesired. Maybe BB Code would be something. daneel
Remove attributes from a tag except the attributes specified, correction of cool routine from joris878 (who seems don't work) + example. When PHP will going to support this natively ? Sorry for my english. Hope everybody understand. --French-- Enlève des attributs d'une balise, sauf les attributs spécifiés dans un tableau. C'est une correction et un exemple de mise en oeuvre du code (très utile) posté par joris878 qui ne semblait pas fonctionner en l'état. Quand PHP supportera ceci de façon native ? ---------- <? function stripeentag($msg,$tag,$attr) { $lengthfirst = 0; while (strstr(substr($msg,$lengthfirst),"<$tag ")!="") { $imgstart = $lengthfirst + strpos(substr($msg,$lengthfirst), "<$tag "); $partafterwith = substr($msg,$imgstart); $img = substr($partafterwith,0,strpos($partafterwith,">")+1); $img = str_replace(" =","=",$msg); $out = "<$tag"; for($i=0; $i <= (count($attr) - 1 );$i++) { $long_val = strpos($img," ",strpos($img,$attr[$i]."=")) - (strpos($img,$attr[$i]."=") + strlen($attr[$i]) + 1) ; $val = substr($img, strpos($img,$attr[$i]."=") + strlen($attr[$i]) + 1,$long_val); if(strlen($val)>0) $attr[$i] = " ".$attr[$i]."=".$val; else $attr[$i] = ""; $out .= $attr[$i]; } $out .= ">"; $partafter = substr($partafterwith,strpos($partafterwith,">")+1); $msg = substr($msg,0,$imgstart).$out.$partafter; $lengthfirst = $imgstart+3; } return $msg; } $message = "<font size=\"10\" face=\"tahoma\" color=\"#DD0000\" >salut</font>" ; //on ne garde que la couleur //we want only "color" attribute $message = stripeentag($message,"font",array("color")); echo $message ; ?> info
Please note that the function supplied by daneel at neezine dot net is not a good way of avoiding XSS attacks. A string like <font size=">>" <script>alert("foo")</script> face="tahoma" color="#DD0000">salut</font> will be sanitized to <font>>" <script>alert("foo")</script> face="tahoma" color="#DD0000">salut</font> which is a pretty good XSS. If you are in need of XSS cleaning, you might want to consider the Pixel-Apes XSS cleaner: http://pixel-apes.com/safehtml nyks
Note for BRYN at drumdatabse dot com (http://www.php.net/manual/fr/function.strip-tags.php#52085) : I've changed your script to support more possibilities. - The first WHILE loop reiterates the second WHILE to strip_tags the html tags which possibly are cuted by the substr() function (and not recognized by the strip_tags() function) - There's no more bugs with substr($textstring,0,1024) ... yes, when the WHILE loop reiterates for the second, third, fourth... time, if the length of $textstring is smaller than 1024 it returns error <?php function strip_tags_in_big_string($textstring){ while($textstring != strip_tags($textstring)) { while (strlen($textstring) != 0) { if (strlen($textstring) > 1024) { $otherlen = 1024; } else { $otherlen = strlen($textstring); } $temptext = strip_tags(substr($textstring,0,$otherlen)); $safetext .= $temptext; $textstring = substr_replace($textstring,'',0,$otherlen); } $textstring = $safetext; } return $textstring; ?> ashley
leathargy at hotmail dot com wrote: "it seems we're all overlooking a few things: 1) if we replace "</ta</tableble>" by removing </table, we're not better off..." I beat this by using ($input contains the data): <?php while($input != strip_tags($input)) { $input = strip_tags($input); } ?> This iteratively strips tags until all tags have gone :) leathargy
it seems we're all overlooking a few things: 1) if we replace "</ta</tableble>" by removing </table, we're not better off. try using a char-by-char comparison, and replaceing stuff with *s, because then this ex would become "</ta******ble>", which is not problemmatic; also, with a char by char approach, you can skip whitespace, and kill stuff like "< table>"... just make sure <&bkspTable> doesn't work... 2) no browser treats { as <.[as far as i know] 3) because of statement 2, we can do: $remove=array("<?","<","?>",">"); $change=array("{[pre]}","{[","{/pre}","]}"); $repairSeek = array("{[pre]}", "</pre>","{[b]}","{[/b]}","{[br]}"); // and so forth... $repairChange("<pre>","</pre>","<b>","<b>"," "); // and so forth... $maltags=array("{[","]}"); $nontags=array("{","}"); $unclean=...;//get variable from somewhere... $unclean=str_replace($remove,$change,$unclean); $unclean=str_replace($repairSeek, $repairChange, $unclean); $clean=str_replace($maltags, $nontags, $unclean); ////end example.... 4) we can further improve the above by using explode(for our ease): function purifyText($unclean, $fixme) { $remove=array(); $remove=explode("\n",$fixit['remove']); //... and so forth for each of the above arrays... // or you could just pass the arrays..., or a giant string //put above here... return $clean }//done php
instead of removing tags that you dont want, sometimes you might want to just stop them from doing anything. <?php $disalowedtags = array("script", "object", "iframe", "image", "applet", "meta", "form", "onmouseover", "onmouseout"); foreach ($_GET as $varname) foreach ($disalowedtags as $tag) if (eregi("<[^>]*".$tag."*\"?[^>]*>", $varname)) die("stop that"); foreach ($_POST as $varname) foreach ($disalowedtags as $tag) if (eregi("<[^>]*".$tag."*\"?[^>]*>", $varname)) die("stop that"); ?> soapergem
In my prior comment I made a mistake that needs correcting. Please change the forward slashes that begin and terminate my regular expression to a different character, like the at-sign (@), for instance. Here's what it should read: $regex = '@</?\w+((\s+\w+(\s*=\s*'; $regex .= '(?:".*?"|\'.*?\'|[^\'">\s]+))?)+'; $regex .= '\s*|\s*)/?>@i'; (There were forward-slashes embedded in the regular expression itself, so using them to begin and terminate the expression would have caused a parse error.) debug
If you wish to steal quotes: $quote=explode( "\n", str_replace(array('document.writeln(\'','\')',';'),'', strip_tags( file_get_contents('http://www.quotationspage.com/data/1mqotd.js') ) ) ); use $quote[2] & $quote[3] It gives you a quote a day cz188658
If you want to remove XHTML tags like <br /> (single pair tags), as an allowable_tags parametr you must include tag Jiri erwin
if you want to disable you can easyly replace all instances of < and > , which will make all HTML code not working.
@dada
if you only want to have the text within the tags, you can use this function: function showtextintags($text) { $text = preg_replace("/(\<script)(.*?)(script>)/si", "dada", "$text"); $text = strip_tags($text); $text = str_replace("<!--", "<!--", $text); $text = preg_replace("/(\<)(.*?)(--\>)/mi", "".nl2br("\\2")."", $text); return $text; } it will show all the text without tags and (!!!) without javascripts metric
I tried using the strip_selected_tags function that salavert created. It works really well for one line text, but if you have hard returns in the text it can't find the other tag. I altered the line where it shifts the text into a variable to replace on OS line returns. $text = preg_replace("/\r\n|\n|\r/","",array_shift($args)); lucahomer
I think the Regular expression posted <a href=function.strip-tags.php#51383>HERE</a> is not correct <?php $disalowedtags = array("font"); foreach ($_GET as $varname) foreach ($disalowedtags as $tag) ---------------------------------------------------------- if (eregi("<[^>]*".$tag."*\"?[^>]*>", $varname)) <--- ---------------------------------------------------------- die("stop that"); ?> this function also replaces links like this : <a href=font.php>test</a> because word "font" is between tags "<" ">". I changed reg exp with this ----------------------------------------------------- if (eregi("(<|</)".$tag."*\"?[^>]*>", $varname)) ----------------------------------------------------- bye Luca elgios
I think that the new function works, but don't remove PHP tags, only html!! <?php function theRealStripTags2($string) { $tam=strlen($string); // tam have number of cars the string $newstring=""; // newstring will be returned $tag=0; /* if tag = 0 => copy car from string to newstring if tag > 0 => don't copy. Found one or more '<' and need to search '>'. If we found 3 '<' need to find all the 3 '>' */ /* I am C programmer. walk in a string is natural for me and more efficient */ for ($i=0; $i < $tam; $i++){ // If I found one '<', $tag++ and continue whithout copy if ($string{$i} == '<'){ $tag++; continue; } // if I found '>', decrease $tag and continue if ($string{$i} == '>'){ if ($tag){ $tag--; } /* $tag never be negative. If string is "<b>test</b>>" (error, of course) $tag will stop in 0 */ continue; } // if $tag is 0, can copy if ($tag == 0){ $newstring .= $string{$i}; // simple copy, only one car } } return $newstring; } echo theRealStripTags2("<tag>test</tag>"); // return "test" ?> bazzy
I think bryn and john780 are missing the point - eric at direnetworks wasn't suggesting there is an overall string limit of 1024 characters but rather that actual tags over 1024 characters long (eg, in his case it sounds like a really long encrypted <a href> tag) will fail to be stripped. The functions to slowly pass strings through strip_tags 1024 characters at a time aren't necessary and are actually counter productive (since if a tag spans the break point, ie it is opened before the 1024 characters and closed after the 1024 characters then only the opening tag is removed which leaves a mess of text up to the closing tag). Only mentioning this as I spent ages working out a better way to deal with this character spanning before I actually went back and read eric's post and realised the subsequent posts were misleading - hopefully it'll save others the same headaches :) blackjackdevel
i slightly modified the function of mrmaxxx333 it wouldn't function with href with single cotes , i also removed or modifyed some syntax, but i tested here and it works i had to jump a line so just glue it : $String="<a href='blah.com'>welcome to blah</a>"; $msgStrip = preg_replace('/<a\s+.*?[href=]["|\']([^"\']+)["|\']> {1}([^<]+)<\/a>/is', '\2 (\1)',$String); it will output welcome to blah (blah.com) matthieu larcher
I noticed some problems with the strip_selected_tags() function below, sometimes big chunks of contents where suppressed... Here is a modified version that should run better. <?php function strip_selected_tags($text, $tags = array()) { $args = func_get_args(); $text = array_shift($args); $tags = func_num_args() > 2 ? array_diff($args,array($text)) : (array)$tags; foreach ($tags as $tag){ while(preg_match('/<'.$tag.'(|\W[^>]*)>(.*)<\/'. $tag .'>/iusU', $text, $found)){ $text = str_replace($found[0],$found[2],$text); } } return preg_replace('/(<('.join('|',$tags).')(|\W.*)\/>)/iusU', '', $text); } ?> lucky760
I needed a way to allow user comments to contain only hyperlinks as the only allowed HTML tags. This is easy enough to accomplish, but I also needed a way to convert full URLs into hyperlinks, and this complicated things a bit. The functions below are not very elegant, but do the job. Function strip_tags_except() works similarly to the strip_selected_tags() function defined a few times on this page, but instead of allowing the user to specify the tags to strip, she can specify the tags to allow and strip all others. The third parameter, $strip, when TRUE removes "<" and ">" from the string and when FALSE converts them to "<" and ">" respectively. Function url_to_link() simply converts full URLs into an equivalent hyperlink taking into consideration that users may end a URL with a character that's not actually part of the address. When using both, url_to_link() should be called before strip_tags_except(). Here's an example as we are using it on http://www.VideoSift.com: <?php $summary = url_to_link($summary); $summary = strip_tags_except($summary, array('a'), FALSE); ?> Here are the function definitions: <?php function strip_tags_except($text, $allowed_tags, $strip=TRUE) { if (!is_array($allowed_tags)) return $text; if (!count($allowed_tags)) return $text; $open = $strip ? '' : '<'; $close = $strip ? '' : '>'; preg_match_all('!<\s*(/)?\s*([a-zA-Z]+)[^>]*>!', $text, $all_tags); array_shift($all_tags); $slashes = $all_tags[0]; $all_tags = $all_tags[1]; foreach ($all_tags as $i => $tag) { if (in_array($tag, $allowed_tags)) continue; $text = preg_replace('!<(\s*' . $slashes[$i] . '\s*' . $tag . '[^>]*)>!', $open . '$1' . $close, $text); } return $text; } function url_to_link($text) { $text = preg_replace('!(^|([^\'"]\s*))' . '([hf][tps]{2,4}:\/\/[^\s<>"\'()]{4,})!mi', '$2<a href="$3">$3</a>', $text); $text = preg_replace('!<a href="([^"]+)[\.:,\]]">!', '<a href="$1">', $text); $text = preg_replace('!([\.:,\]])</a>!', '</a>$1', $text); return $text; } ?> bfmaster_duran
I made this function with regular expression to remove some style properties from tags based in other exaples here ;D <? function removeAttributes($htmlText) { $stripAttrib = "'\\s(class)=\"(.*?)\"'i"; //remove classes from html tags; $htmlText = stripslashes($htmlText); $htmlText = preg_replace($stripAttrib, '', $htmlText); $stripAttrib = "/(font\-size|color|font\-family|line\-height):\\s". "(\\d+(\\x2E\\d+\\w+|\\W)|\\w+)(;|)(\\s|)/i"; //remove font-style,color,font-family,line-height from style tags in the text; $htmlText = stripslashes($tagSource); $htmlText = preg_replace($stripAttrib, '', $htmlText); $htmlText = str_replace(" style=\"\"", '', $htmlText); //remove empty style tags, after the preg_replace above (style=""); return $htmlText; } function removeEvilTags($source) { return preg_replace('/<(.*?)>/ie', "'<'.removeEvilAttributes('\\1').'>'", $source); } ?> Usage: <? $text = '<p style="line-height: 150%; font-weight: bold" class="MsoNormal"><span style="font-size: 10.5pt; line-height: 150%; font-family: Verdana">Com o compromisso de pioneirismo e aprimoramento, características da Oftalmoclínica, novos equipamentos foram adquiridos para exames e diagnósticos ainda mais precisos:</span>'; //This text is in brazillian portuguese ;D echo htmlentities(removeEvilTags($text))."\r\n"; //This is return: <p style="font-weight: bold"><span>Com o compromisso de pioneirismo e aprimoramento, características da Oftalmoclínica, novos equipamentos foram adquiridos para exames e diagnósticos ainda mais precisos:</span> ?> W0oT ! This is fantastic ! If you find an error, please report me to my mail ;D (Y) rodt
I have used this function successfully to prevent bots inserting HTML to web forms. Put the fields' contents into an array, then feed array to this function as an argument. Returns false if HTML is included; true if there is no HTML in any of the array's values. Hope it's helpful to someone. /* Checks that there is no HTML in any of provided fields. $ary_no_html_allowed = Array to check for HTML content. */ function screenForm($ary_check_for_html){ // check array - reject if any content contains HTML. foreach($ary_check_for_html as $field_value) { $stripped = strip_tags($field_value); if($field_value!=$stripped) { // something in the field value was HTML return false; } } return true; } } php
I have had a similar problem to kangaroo232002 at yahoo dot co dot uk when stripping tags from html containing javascript. The javascript can obviously contain '>' and '<' as comparison operators which are seen by strip_tags() as html tags - leading to undesired results. To christianbecke at web dot de - this can be third-party html, so although perhaps not always 'correct', that's how it is! xyexz
I have found with this function that sometimes it will only remove the first carrot from a tag and leave the rest of the tag in the string, which obviously isn't what I'm looking for. EX: <?php //Returns "tag>test/tag>" echo strip_tags("<tag>test</tag>"); ?> I'm trying to strip_tags on a string I'm importing from xml so perhaps it has something to do with that but if you've run into this same issue I've written a function to fix it once and for all! <?php function theRealStripTags($string) { //while there are tags left to remove while(strstr($string, '>')) { //find position of first carrot $currentBeg = strpos($string, '<'); //find position of end carrot $currentEnd = strpos($string, '>'); //find out if there is string before first carrot //if so save it in $tmpstring $tmpStringBeg = @substr($string, 0, $currentBeg); //find out if there is string after last carrot //if so save it in $tmpStringEnd $tmpStringEnd = @substr($string, $currentEnd + 1, strlen($string)); //cut the tag from the string $string = $tmpStringBeg.$tmpStringEnd; } return $string; } //Returns "test" echo theRealStripTags('<tag>test</tag>'); ?> magdolen
i edited strip_selected_tags function that salavert created to strip also single tags (xhtml only) here it is also with metric modification: function strip_selected_tags($text, $tags = array()) { $args = func_get_args(); // metric edit $text = preg_replace("/\r\n|\n|\r/","",array_shift($args)); $tags = func_num_args() > 2 ? array_diff($args,array($text)) : (array)$tags; foreach ($tags as $tag){ if(preg_match_all('/<'.$tag.'[^>]*>(.*)<\/'.$tag.'>/iU', $text, $found)){ $text = str_replace($found[0],$found[1],$text); } // hrax edit if(preg_match_all('/<'.$tag.'.*\/>/iU', $text, $found)){ $text = str_replace($found[0], "", $text); } } return $text; } isaac schlueter php
I am creating a rendering plugin for a CMS system (http://b2evolution.net) that wraps certain bits of text in acronym tags. The problem is that if you have something like this: <a href="http://www.php.net" title="PHP is cool!">PHP</a> then the plugin will mangle it into: <a href="http://www.<acronym title="PHP: Hypertext Processor">php</acronym>.net" title="<acronym title="PHP: Hypertext Processor">PHP</acronym> is cool!>PHP</a> This function will strip out tags that occur within other tags. Not super-useful in tons of situations, but it was an interesting puzzle. I had started out using preg_replace, but it got riduculously complicated when there were linebreaks and multiple instances in the same tag. The CMS does its XHTML validation before the content gets to the plugin, so we can be pretty sure that the content is well-formed, except for the tags inside of other tags. <?php if( !function_exists( 'antiTagInTag' ) ) { // $content is the string to be anti-tagintagged, and $format sets the format of the internals. function antiTagInTag( $content = '', $format = 'htmlhead' ) { if( !function_exists( 'format_to_output' ) ) { // Use the external function if it exists, or fall back on just strip_tags. function format_to_output($content, $format) { return strip_tags($content); } } $contentwalker = 0; $length = strlen( $content ); $tagend = -1; for( $tagstart = strpos( $content, '<', $tagend + 1 ) ; $tagstart !== false && $tagstart < strlen( $content ); $tagstart = strpos( $content, '<', $tagend ) ) { // got the start of a tag. Now find the proper end! $walker = $tagstart + 1; $open = 1; while( $open != 0 && $walker < strlen( $content ) ) { $nextopen = strpos( $content, '<', $walker ); $nextclose = strpos( $content, '>', $walker ); if( $nextclose === false ) { // ERROR! Open waka without close waka! // echo '<code>Error in antiTagInTag - malformed tag!</code> '; return $content; } if( $nextopen === false || $nextopen > $nextclose ) { // No more opens, but there was a close; or, a close happens before the next open. // walker goes to the close+1, and open decrements $open --; $walker = $nextclose + 1; } elseif( $nextopen < $nextclose ) { // an open before the next close $open ++; $walker = $nextopen + 1; } } $tagend = $walker; if( $tagend > strlen( $content ) ) $tagend = strlen( $content ); else { $tagend --; $tagstart ++; } $tag = substr( $content, $tagstart, $tagend - $tagstart ); $tags[] = '<' . $tag . '>'; $newtag = format_to_output( $tag, $format ); $newtags[] = '<' . $newtag . '>'; $newtag = format_to_output( $tag, $format ); } $content = str_replace($tags, $newtags, $content); return $content; } } sébastien
hum, it seems that your function "theRealStripTags" won't have the right behavior in some cases, for example: <?php theRealStripTags("<!-- I want to put a <div>tag</div> -->"); theRealStripTags("<!-- Or a carrot > -->"); theRealStripTags("<![CDATA[what about this! It's to protect from HTML characters like <tag>, > and so on in XML, no?]]> -->"); ?> geersc
Hi, I made the following adjustments to the "stripeentag()" function listed here. Improvements are always welcome. Regards, Chris <?php function strip_attributes($msg, $tag, $attr, $suffix = "") { $lengthfirst = 0; while (strstr(substr($msg, $lengthfirst), "<$tag ") != "") { $tag_start = $lengthfirst + strpos(substr($msg, $lengthfirst), "<$tag "); $partafterwith = substr($msg, $tag_start); $img = substr($partafterwith, 0, strpos($partafterwith, ">") + 1); $img = str_replace(" =", "=", $img); $out = "<$tag"; for($i=0; $i < count($attr); $i++) { if (empty($attr[$i])) { continue; } $long_val = (strpos($img, " ", strpos($img, $attr[$i] . "=")) === FALSE) ? strpos($img, ">", strpos($img, $attr[$i] . "=")) - (strpos($img, $attr[$i] . "=") + strlen($attr[$i]) + 1) : strpos($img, " ", strpos($img, $attr[$i] . "=")) - (strpos($img, $attr[$i] . "=") + strlen($attr[$i]) + 1); $val = substr($img, strpos($img, $attr[$i] . "=" ) + strlen($attr[$i]) + 1, $long_val); if (!empty($val)) { $out .= " " . $attr[$i] . "=" . $val; } } if (!empty($suffix)) { $out .= " " . $suffix; } $out .= ">"; $partafter = substr($partafterwith, strpos($partafterwith,">") + 1); $msg = substr($msg, 0, $tag_start). $out. $partafter; $lengthfirst = $tag_start + 3; } return $msg; } ?> trexx www.trexx.ch
Here's a quite fast solution to remove unwanted tags AND also unwanted attributes within the allowed tags: <?php /** * Allow these tags */ $allowedTags = '<h1><b><i><a><ul><li><pre><hr><blockquote><img>'; /** * Disallow these attributes/prefix within a tag */ $stripAttrib = 'javascript:|onclick|ondblclick|onmousedown|onmouseup|onmouseover|'. 'onmousemove|onmouseout|onkeypress|onkeydown|onkeyup'; /** * @return string * @param string * @desc Strip forbidden tags and delegate tag-source check to removeEvilAttributes() */ function removeEvilTags($source) { global $allowedTags; $source = strip_tags($source, $allowedTags); return preg_replace('/<(.*?)>/ie', "'<'.removeEvilAttributes('\\1').'>'", $source); } /** * @return string * @param string * @desc Strip forbidden attributes from a tag */ function removeEvilAttributes($tagSource) { global $stripAttrib; return stripslashes(preg_replace("/$stripAttrib/i", 'forbidden', $tagSource)); } // Will output: <a href="forbiddenalert(1);" target="_blank" forbidden =" alert(1)">test</a> echo removeEvilTags('<a href="javascript:alert(1);" target="_blank" onMouseOver = "alert(1)">test</a>'); ?> dontknowwhat
Here's a quickie that will strip out only specific tags. I'm using it to clean up Frontpage and WORD code from included third-party code (which shouldn't have the all the extra header information in it). $contents = "Your HTML string"; // Part 1 // This array is for single tags and their closing counterparts $tags_to_strip = Array("html","body","meta","link","head"); foreach ($tags_to_strip as $tag) { $contents = preg_replace("/<\/?" . $tag . "(.|\s)*?>/","",$contents); } // Part 2 // This array is for stripping opening and closing tags AND what's in between $tags_and_content_to_strip = Array("title"); foreach ($tags_and_content_to_strip as $tag) { $contents = preg_replace("/<" . $tag . ">(.|\s)*?<\/" . $tag . ">/","",$contents); } cesar
Here is a recursive function for strip_tags like the one showed in the stripslashes manual page. <?php function strip_tags_deep($value) { return is_array($value) ? array_map('strip_tags_deep', $value) : strip_tags($value); } // Example $array = array('<b>Foo</b>', '<i>Bar</i>', array('<b>Foo</b>', '<i>Bar</i>')); $array = strip_tags_deep($array); // Output print_r($array); ?> bermi ferrer
Here is a faster and tested version of strip_selected_tags. Previous example had a small bug that has been fixed now. <?php function strip_selected_tags($text, $tags = array()) { $args = func_get_args(); $text = array_shift($args); $tags = func_num_args() > 2 ? array_diff($args,array($text)) : (array)$tags; foreach ($tags as $tag){ if( preg_match_all( '/<'.$tag.'[^>]*>([^<]*)<\/'.$tag.'>/iu', $text, $found) ){ $text = str_replace($found[0],$found[1],$text); } } return preg_replace( '/(<('.join('|',$tags).')(\\n|\\r|.)*\/>)/iu', '', $text); } ?> bryn -at- drumdatabase dot net
Further to john780's idea for a solution to the 1024 character limit of strip_tags - it's a good one, but I think the ltrim function isn't the one for the job? I wrote this simple function to get around the limit (I'm a newbie, so there may be some problem / better way of doing it!): <? function strip_tags_in_big_string($textstring){ while (strlen($textstring) != 0) { $temptext = strip_tags(substr($textstring,0,1024)); $safetext .= $temptext; $textstring = substr_replace($textstring,'',0,1024); } return $safetext; } ?> Hope someone finds it useful. chuck
Caution, HTML created by Word may contain the sequence '<?xml...' Apparently strip_slashes treats this like <?php and removes the remainder of the input string. Not the just the XML tag but all input that follows. anonymous user
Be aware that tags constitute visual whitespace, so stripping may leave the resulting text looking misjoined. For example, "<strong>This is a bit of text</strong><p />Followed by this bit" are seperable paragraphs on a visual plane, but if simply stripped of tags will result in "This is a bit of textFollowed by this bit" which may not be what you want, e.g. if you are creating an excerpt for an RSS description field. The workaround is to force whitespace prior to stripping, using something like this: $text = getTheText(); $text = preg_replace('/</',' <',$text); $text = preg_replace('/>/','> ',$text); $desc = html_entity_decode(strip_tags($text)); $desc = preg_replace('/[\n\r\t]/',' ',$desc); $desc = preg_replace('/ /',' ',$desc); kangaroo232002
After wondering why the following was indexed in my trawler despite stripping all text in tags (and punctuation) "» valign left align middle border 0 src go gif name search1 onclick search", please take a quick look at what produced it: <DIV style="position: absolute; TOP:22%; LEFT:68%;"><input type="image" alt="Go >" valign="left" align="middle" border=0 src="go.gif" name="search1" onClick="search()"></div>... looking at this closely, it is possible to see that despite the 'Go >' statement being enclosed in speech marks (with the right facing chevron), strip_tags() still assumes that it is the end of the input statement, and treats everything after as text. Not sure if this has been fixed in later versions; im using v4.3.3... good hunting. uersoy
admin at automapit dot com's function is great. Cleans everything I don't need :). But there is a small problem; strip style tags line should be before strip html tags line. Otherwise, strip html tags section cleans the <style></style> and between them is stays there as text. <?php function html2txt($document){ $search = array('@<script[^>]*?>.*?</script>@si', // Strip out javascript '@<style[^>]*?>.*?</style>@siU', // Strip style tags properly '@<[\/\!]*?[^<>]*?>@si', // Strip out HTML tags '@<![\s\S]*?--[ \t\n\r]*>@' // Strip multi-line comments including CDATA ); $text = preg_replace($search, '', $document); return $text; } ?> jeremysfilms.com
A simple little function for blocking tags by replacing the '<' and '>' characters with their HTML entities. Good for simple posting systems that you don't want to have a chance of stripping non-HTML tags, or just want everything to show literally without any security issues: <?php function block_tags($string){ $replaced_string = str_ireplace('<','<',$string); $replaced_string = str_ireplace('>','>',$replaced_string); return $replaced_string; } echo block_tags('<b>HEY</b>'); //Returns <b>HEY</b> ?> pierresyraud
A function inverse of, for strip any text and keep html tags !!! function strip_text($a){ $i=-1;$n='';$ok=1; while(isset($a{++$i})){ if($ok&&$a{$i}!='<'){continue;} elseif($a{$i}=='>'){$ok=1;$n.='>';continue;} elseif($a{$i}=='<'){$ok=0;} if(!$ok){$n.=$a{$i};}} return $n;} anonymous
A different approach to cleaning up HTML would be to first escape all unsafe characters: & to & < to < > to > then to unescape matching pairs of tags back (e.g. "<b>hello</b>" => "<b>hello</b>"), if it is identified safe. This backwards-approach should be safer because if a tag is not identified correctly, it is, at the end, in an escaped state. So if a user enters invalid html, or tags that are unsupported or unwanted, they are shown in plain text, and not stripped away. This is good, because the characters "<" and ">" might have been used in a different way (e.g. to make a text arrow: "a <=> b"). This is the case in most forums (apart from the fact that they use "[tag]"-tags instead of "<tag>"-tags) balluche arobase free.fr
//balluche:22/01/04:Remove even bad tags function strip_bad_tags($html) { $s = preg_replace ("@</?[^>]*>*@", "", $html); return $s; } dumb
/* 15Jan05 Within <textarea>, Browsers auto render & display certain "HTML Entities" and "HTML Entity Codes" as characters: < shows as < -- & shows as & -- etc. Browsers also auto change any "HTML Entity Codes" entered in a <textarea> into the resultant display characters BEFORE UPLOADING. There's no way to change this, making it difficult to edit html in a <textarea> "HTML Entity Codes" (ie, use of < to represent "<", & to represent "&"   to represent " ") can be used instead. Therefore, we need to "HTML-Entitize" the data for display, which changes the raw/displayed characters into their HTML Entity Code equivalents before being shown in a <textarea>. how would I get a textarea to contain "<" as a literal string of characters and not have it display a "<" &lt; is indeed the correct way of doing that. And if you wanted to display that, you'd need to use &amp;lt;'. That's just how HTML entities work. htmlspecialchars() is a subset of htmlentities() the reverse (ie, changing html entity codes into displayed characters, is done w/ html_entity_decode() google on ns_quotehtml and see http://aolserver.com/docs/tcl/ns_quotehtml.html see also http://www.htmlhelp.com/reference/html40/entities/ */ webmaster
<?php function remove_tag ( $tag , $data ) { while ( eregi ( "<" . $tag , $data ) ) { $it = stripos ( $data , "<" . $tag ) ; $it2 = stripos ( $data , "</" . $tag . ">" ) + strlen ( $tag ) + 3 ; $temp = substr ( $data , 0 , $it ) ; $temp2 = substr ( $data , $it2 , strlen ( $data ) ) ; $data = $temp . $temp2 ; } return $data ; } ?> this code will remove only and all of the specified tag from a given haystack. 10-aug-2005 08:08
<?php /**removes specifed tags from the text where each tag requires a *closing tag and if the later *is not found then everything after will be removed *typical usage: *some html text, array('script','body','html') - all lower case*/ public static function removeTags($text,$tags_array){ $length = strlen($text); $pos =0; $tags_array = $array_flip($tags_array); while ($pos < $length && ($pos = strpos($text,'<',$pos)) !== false){ $dlm_pos = strpos($text,' ',$pos); $dlm2_pos = strpos($text,'>',$pos); if ($dlm_pos > $dlm2_pos)$dlm_pos=$dlm2_pos; $which_tag = strtolower(substr($text,$pos+1,$dlm_pos-($pos+1))); $tag_length = strlen($srch_tag); if (!isset($tags_array[$which_tag])){ //if no tag matches found ++$pos; continue; } //find the end $sec_tag = '</'.$which_tag.'>'; $sec_pos = stripos($text,$sec_tag,$pos+$tag_length); //remove everything after if end of the tag not found if ($sec_pos === false) $sec_pos = $length-strlen($sec_tag); $rmv_length = $sec_pos-$pos+strlen($sec_tag); $text = substr_replace($text,'',$pos,$rmv_length); //update length $length = $length - $rmv_length; $pos++; } return $text; } ?> david
<?php /** * strip_selected_tags ( string str [, string strip_tags[, strip_content flag]] ) * --------------------------------------------------------------------- * Like strip_tags() but inverse; the strip_tags tags will be stripped, not kept. * strip_tags: string with tags to strip, ex: "<a> <quote>" etc. * strip_content flag: TRUE will also strip everything between open and closed tag */ public function strip_selected_tags($str, $tags = "", $stripContent = false) { preg_match_all("/<([^>]+)>/i",$tags,$allTags,PREG_PATTERN_ORDER); foreach ($allTags[1] as $tag){ if ($stripContent) { $str = preg_replace("/<".$tag."[^>]*>.*<\/".$tag.">/iU","",$str); } $str = preg_replace("/<\/?".$tag."[^>]*>/iU","",$str); } return $str; } ?> salavert at~ akelos
<?php /** * Works like PHP function strip_tags, but it only removes selected tags. * Example: * strip_selected_tags('<b>Person:</b> <strong>Salavert</strong>', 'strong') => <b>Person:</b> Salavert */ function strip_selected_tags($text, $tags = array()) { $args = func_get_args(); $text = array_shift($args); $tags = func_num_args() > 2 ? array_diff($args,array($text)) : (array)$tags; foreach ($tags as $tag){ if(preg_match_all('/<'.$tag.'[^>]*>(.*)<\/'.$tag.'>/iU', $text, $found)){ $text = str_replace($found[0],$found[1],$text); } } return $text; } ?> Hope you find it useful, Jose Salavert admin
<? function html2txt($document){ $search = array('@<script[^>]*?>.*?</script>@si', // Strip out javascript '@<[\/\!]*?[^<>]*?>@si', // Strip out HTML tags '@<style[^>]*?>.*?</style>@siU', // Strip style tags properly '@<![\s\S]*?--[ \t\n\r]*>@' // Strip multi-line comments including CDATA ); $text = preg_replace($search, '', $document); return $text; } ?> This function turns HTML into text... strips tags, comments spanning multiple lines including CDATA, and anything else that gets in it's way. It's a frankenstein function I made from bits picked up on my travels through the web, thanks to the many who have unwittingly contributed! 09-aug-2006 10:08
<? function html2txt($document){ $search = array('@<script[^>]*?>.*?</script>@si', // Strip out javascript '@<[\\/\\!]*?[^<>]*?>@si', // Strip out HTML tags '@<style[^>]*?>.*?</style>@siU', // Strip style tags properly '@<![\\s\\S]*?--[ \\t\\n\\r]*>@' // Strip multi-line comments including CDATA ); $text = preg_replace($search, '', $document); return $text; } ?> This function turns HTML into text... strips tags, comments spanning multiple lines including CDATA, and anything else that gets in it's way. It's a frankenstein function I made from bits picked up on my travels through the web, thanks to the many who have unwittingly contributed! |
Change Languageaddcslashes addslashes bin2hex chop chr chunk_split convert_cyr_string convert_uudecode convert_uuencode count_chars crc32 crypt echo explode fprintf get_html_translation_table hebrev hebrevc html_entity_decode htmlentities htmlspecialchars_decode htmlspecialchars implode join levenshtein localeconv ltrim md5_file md5 metaphone money_format nl_langinfo nl2br number_format ord parse_str printf quoted_printable_decode quotemeta rtrim setlocale sha1_file sha1 similar_text soundex sprintf sscanf str_getcsv str_ireplace str_pad str_repeat str_replace str_rot13 str_shuffle str_split str_word_count strcasecmp strchr strcmp strcoll strcspn strip_tags stripcslashes stripos stripslashes stristr strlen strnatcasecmp strnatcmp strncasecmp strncmp strpbrk strpos strrchr strrev strripos strrpos strspn strstr strtok strtolower strtoupper strtr substr_compare substr_count substr_replace substr trim ucfirst ucwords vfprintf vprintf vsprintf wordwrap |