|
mb_decode_numericentity
Decode HTML numeric string reference to character
(PHP 4 >= 4.0.6, PHP 5)
Example 1395. convmap example$convmap = array ( Code Examples / Notes » mb_decode_numericentitydonovan
note that at this time it seems that mb_decode_numericentity() only works with decimal entities and not hexadecimal entities. This fact would have saved me a good hour of time in debugging. For those who need to convert hex entities try first converting them all to decimal entities with a combination of the preg_replace() and hexdec() functions. andrew simpson
Many web browsers will tend upload high order characters as UTF-8 encoded entities. Here is some simple code to convert UTF-8 HTML entities within a block of text into proper characters: <?php //decode decimal HTML entities added by web browser $body = preg_replace('/&#\d{2,5};/ue', "utf8_entity_decode('$0')", $body ); //decode hex HTML entities added by web browser $body = preg_replace('/&#x([a-fA-F0-7]{2,8});/ue', "utf8_entity_decode('&#'.hexdec('$1').';')", $body ); //callback function for the regex function utf8_entity_decode($entity){ $convmap = array(0x0, 0x10000, 0, 0xfffff); return mb_decode_numericentity($entity, $convmap, 'UTF-8'); } ?> dev
Just two great functions for daily use: /* Converts any HTML-entities into characters */ function my_numeric2character($t) { $convmap = array(0x0, 0x2FFFF, 0, 0xFFFF); return mb_decode_numericentity($t, $convmap, 'UTF-8'); } /* Converts any characters into HTML-entities */ function my_character2numeric($t) { $convmap = array(0x0, 0x2FFFF, 0, 0xFFFF); return mb_encode_numericentity($t, $convmap, 'UTF-8'); } print my_numeric2character('’ ἀ â'); print my_character2numeric(' â '); php
Here are functions to convert hankaku to zenkaku characters (and vice-versa) in Japanese text. <?php // Supported characters: // (space) // !#$%&()*+,./0123456789:;<=>?@ // ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_` // abcdefghijklmnopqrstuvwxyz{|} // (Katakana isn't supported.) function f_han2zen ($string,$encoding = null) { if (is_null($encoding)) $encoding = mb_internal_encoding(); $convmap = array( 0x20,0x20,0x3000-0x20,0xffff, // Space 0x21,0x7e,0xff01-0x21,0xffff); $temp = mb_encode_numericentity($string,$convmap,$encoding); $convmap = array(0,0xffff,0,0xffff); return mb_decode_numericentity($temp,$convmap,$encoding); } function f_zen2han ($string,$encoding = null) { if (is_null($encoding)) $encoding = mb_internal_encoding(); $convmap = array( 0x3000,0x3000,-(0x3000-0x20),0xffff, // Space 0xff01,0xff5e,-(0xff01-0x21),0xffff); $temp = mb_encode_numericentity($string,$convmap,$encoding); $convmap = array(0,0xffff,0,0xffff); return mb_decode_numericentity($temp,$convmap,$encoding); } // Sample usage: f_han2zen("test","shift_jis"); f_han2zen("test","utf-8"); ?> dirk
By use of function utf8_decode you'll get a problem with all extended chars above ISO-8859-1 charset. You can solve this problem by using the function mb_encode_numericentity before: // convert $text from UTF-8 to ISO-8859-1 $convmap = array(0xFF, 0x2FFFF, 0, 0xFFFF); $text = mb_encode_numericentity($text, $convmap, "UTF-8"); $text = utf8_decode($text); The second line encodes all extended chars below 0xFF, the third line converts the rest: 0x80 - 0xFF |
Change Languagemb_check_encoding mb_convert_case mb_convert_encoding mb_convert_kana mb_convert_variables mb_decode_mimeheader mb_decode_numericentity mb_detect_encoding mb_detect_order mb_encode_mimeheader mb_encode_numericentity mb_ereg_match mb_ereg_replace mb_ereg_search_getpos mb_ereg_search_getregs mb_ereg_search_init mb_ereg_search_pos mb_ereg_search_regs mb_ereg_search_setpos mb_ereg_search mb_ereg mb_eregi_replace mb_eregi mb_get_info mb_http_input mb_http_output mb_internal_encoding mb_language mb_output_handler mb_parse_str mb_preferred_mime_name mb_regex_encoding mb_regex_set_options mb_send_mail mb_split mb_strcut mb_strimwidth mb_stripos mb_stristr mb_strlen mb_strpos mb_strrchr mb_strrichr mb_strripos mb_strrpos mb_strstr mb_strtolower mb_strtoupper mb_strwidth mb_substitute_character mb_substr_count mb_substr |