Fixing odd characters
By Iain Cuthbertson
One of the biggest complaints that web developers have is their lack of control over end users.
When will we be able to tell them a) not to paste Micosoft Word or b) if you must, paste it into something that doesn’t do formatting 1st.
It has a nasty habit of converting quotes into “smart” quotes as well as messing up characters such as the euro symbol €.
This is the fix I came up with that allows end users to continue to paste from such programs and bring their weird charset issues with them:
function fixChars($text)
{
$chars = array(
"/xe2x82xac/" => "€",
"/xe2x80x99/" => "'",
"/xe2x80x9c/" => """, // open quotes
"/xe2x80x9d/" => """, // close quotes
"/xe2x80x93/" => "—",
"/x80/" => "€",
"/x92/" => "'",
"/x93/" => """, // open quotes
"/x94/" => """, // close quotes
"/x96/" => "—",
"/xa3/" => "£",
);
$find = array_keys($chars);
$replace = array_values($chars);
return preg_replace($find, $replace, $text);
}
Comments
D. Rimron says: 4th August 2011 at 5:14 pm
http://www.liamdelahunty.com/tips/remove_special_characters.php 🙂
-Dx
Iain says: 4th August 2011 at 5:20 pm
That looks a little more comprehensive (though missing the UK pound symbol) 🙂
I did it myself as there is more than one weird set of chars to cope with.
Could always be extended.