|
mb_convert_encoding
Convert character encoding
(PHP 4 >= 4.0.6, PHP 5)
Example 1392. mb_convert_encoding() example<?php Code Examples / Notes » mb_convert_encodingtom class
Why did you use the php html encode functions? mbstring has it's own Encoding which is (as far as I tested it) much more usefull: HTML-ENTITIES Example: $text = mb_convert_encoding($text, 'HTML-ENTITIES', "UTF-8"); aofg
When converting Japanese strings to ISO-2022-JP or JIS on PHP >= 5.2.1, you can use "ISO-2022-JP-MS" instead of them. Kishu-Izon (platform dependent) characters are converted correctly with the encoding, as same as with eucJP-win or with SJIS-win. stephan van der feest
To add to the Flash conversion comment below, here's how I convert back from what I've stored in a database after converting from Flash HTML text field output, in order to load it back into a Flash HTML text field: function htmltoflash($htmlstr) { return str_replace("<br />","\n", str_replace("<","<", str_replace(">",">", mb_convert_encoding(html_entity_decode($htmlstr), "UTF-8","ISO-8859-1")))); } petruzanauticoyahoo?com!ar
May be I'm not getting something, but this code: <?php print mb_detect_encoding( "ñ" ) print "<br/>" print mb_convert_encoding( "ñ", "UTF-8" ); ?> Will yield this output: UTF-8 ñ So, was the string encoded in UTF-8 or wasn't it? phpdoc
I'd like to share some code to convert latin diacritics to their traditional 7bit representation, like, for example, - à ,ç,é,î,... to a,c,e,i,... - à to ss - ä,Ã,... to ae,Ae,... - ë,... to e,... (mb_convert "7bit" would simply delete any offending characters). I might have missed on your country's typographic conventions--correct me then. <?php /** * @args string $text line of encoded text * string $from_enc (encoding type of $text, e.g. UTF-8, ISO-8859-1) * * @returns 7bit representation */ function to7bit($text,$from_enc) { $text = mb_convert_encoding($text,'HTML-ENTITIES',$from_enc); $text = preg_replace( array('/ß/','/&(..)lig;/', '/&([aouAOU])uml;/','/&(.)[^;]*;/'), array('ss',"$1","$1".'e',"$1"), $text); return $text; } ?> Enjoy :-) Johannes volker
Hey guys. For everybody who's looking for a function that is converting an iso-string to utf8 or an utf8-string to iso, here's your solution: public function encodeToUtf8($string) { return mb_convert_encoding($string, "UTF-8", mb_detect_encoding($string, "UTF-8, ISO-8859-1, ISO-8859-15", true)); } public function encodeToIso($string) { return mb_convert_encoding($string, "ISO-8859-1", mb_detect_encoding($string, "UTF-8, ISO-8859-1, ISO-8859-15", true)); } For me these functions are working fine. Give it a try stephan van der feest
Here's a tip for anyone using Flash and PHP for storing HTML output submitted from a Flash text field in a database or whatever. Flash submits its HTML special characters in UTF-8, so you can use the following function to convert those into HTML entity characters: function utf8html($utf8str) { return htmlentities(mb_convert_encoding($utf8str,"ISO-8859-1","UTF-8")); } mac.com@nemo
For those wanting to convert from $set to MacRoman, use iconv(): <?php $string = iconv('UTF-8', 'macintosh', $string); ?> ('macintosh' is the IANA name for the MacRoman character set.) jamespilcher1 - hotmail
be careful when converting from iso-8859-1 to utf-8. even if you explicitly specify the character encoding of a page as iso-8859-1(via headers and strict xml defs), windows 2000 will ignore that and interpret it as whatever character set it has natively installed. for example, i wrote char #128 into a page, with char encoding iso-8859-1, and it displayed in internet explorer (& mozilla) as a euro symbol. it should have displayed a box, denoting that char #128 is undefined in iso-8859-1. The problem was it was displaying in "Windows: western europe" (my native character set). this led to confusion when i tried to convert this euro to UTF-8 via mb_convert_encoding() IE displays UTF-8 correctly- and because PHP correctly converted #128 into a box in UTF-8, IE would show a box. so all i saw was mb_convert_encoding() converting a euro symbol into a box. It took me a long time to figure out what was going on. david hull
As an alternative to Johannes's suggestion for converting strings from other character sets to a 7bit representation while not just deleting latin diacritics, you might try this: <?php $text = iconv($from_enc, 'US-ASCII//TRANSLIT', $text); ?> The only disadvantage is that it does not convert "ä" to "ae", but it handles punctuation and other special characters better. -- David lanka
Another sample of recoding without MultiByte enabling. (Russian koi->win, if input in win-encoding already, function recode() returns unchanged string) <?php // 0 - win // 1 - koi function detect_encoding($str) { $win = 0; $koi = 0; for($i=0; $i<strlen($str); $i++) { if( ord($str[$i]) >224 && ord($str[$i]) < 255) $win++; if( ord($str[$i]) >192 && ord($str[$i]) < 223) $koi++; } if( $win < $koi ) { return 1; } else return 0; } // recodes koi to win function koi_to_win($string) { $kw = array(128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 254, 224, 225, 246, 228, 229, 244, 227, 245, 232, 233, 234, 235, 236, 237, 238, 239, 255, 240, 241, 242, 243, 230, 226, 252, 251, 231, 248, 253, 249, 247, 250, 222, 192, 193, 214, 196, 197, 212, 195, 213, 200, 201, 202, 203, 204, 205, 206, 207, 223, 208, 209, 210, 211, 198, 194, 220, 219, 199, 216, 221, 217, 215, 218); $wk = array(128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 225, 226, 247, 231, 228, 229, 246, 250, 233, 234, 235, 236, 237, 238, 239, 240, 242, 243, 244, 245, 230, 232, 227, 254, 251, 253, 255, 249, 248, 252, 224, 241, 193, 194, 215, 199, 196, 197, 214, 218, 201, 202, 203, 204, 205, 206, 207, 208, 210, 211, 212, 213, 198, 200, 195, 222, 219, 221, 223, 217, 216, 220, 192, 209); $end = strlen($string); $pos = 0; do { $c = ord($string[$pos]); if ($c>128) { $string[$pos] = chr($kw[$c-128]); } } while (++$pos < $end); return $string; } function recode($str) { $enc = detect_encoding($str); if ($enc==1) { $str = koi_to_win($str); } return $str; } ?> |
Change Languagemb_check_encoding mb_convert_case mb_convert_encoding mb_convert_kana mb_convert_variables mb_decode_mimeheader mb_decode_numericentity mb_detect_encoding mb_detect_order mb_encode_mimeheader mb_encode_numericentity mb_ereg_match mb_ereg_replace mb_ereg_search_getpos mb_ereg_search_getregs mb_ereg_search_init mb_ereg_search_pos mb_ereg_search_regs mb_ereg_search_setpos mb_ereg_search mb_ereg mb_eregi_replace mb_eregi mb_get_info mb_http_input mb_http_output mb_internal_encoding mb_language mb_output_handler mb_parse_str mb_preferred_mime_name mb_regex_encoding mb_regex_set_options mb_send_mail mb_split mb_strcut mb_strimwidth mb_stripos mb_stristr mb_strlen mb_strpos mb_strrchr mb_strrichr mb_strripos mb_strrpos mb_strstr mb_strtolower mb_strtoupper mb_strwidth mb_substitute_character mb_substr_count mb_substr |