{"id":215,"date":"2007-09-10T19:44:58","date_gmt":"2007-09-10T18:44:58","guid":{"rendered":"http:\/\/www.nivas.hr\/blog\/2007\/09\/10\/mysql-utf8-latin2-utf8-weirdness\/"},"modified":"2007-09-10T19:55:24","modified_gmt":"2007-09-10T18:55:24","slug":"mysql-utf8-latin2-utf8-weirdness","status":"publish","type":"post","link":"https:\/\/www.nivas.hr\/blog\/2007\/09\/10\/mysql-utf8-latin2-utf8-weirdness\/","title":{"rendered":"mysql utf8 latin2 utf8 weirdness"},"content":{"rendered":"<p>This one is a keeper, have to write it down somewhere in case I forget it by tomorrow morning. :) Today I was doing routine deploy of one of our new websites to production server. Everything went smoothly, except the fact I couldn&#8217;t see any of Croatian letters on the site texts (which were stored in mysql)!! We did migration to UTF8 some time ago, and this was highly unexpected. So I started to dig into the dump. :)<\/p>\n<p>Looking at the mysqldump of a database I was a bit surprised to see Croatian letters screwed up beyond recognition. For example letter &#8220;\u0161&#8221; was represented as &#8220;\u00c4\u0105\u00c4O\u201e&#8221; (4 bytes), &#8220;\u010d&#8221; was &#8220;\u00c4\u008dO&#8221; (two bytes)&#8230; etc. OMG! All our tables are &#8220;CHARSET=utf8&#8221;, we force &#8220;AddDefaultCharset utf-8&#8221; in Apache and all of our markup uses &#8220;charset=utf-8&#8221; for content type encoding. BUT, we unintentionally left one very bad call in our config file which got triggered if site was running on development server &#8211; &#8220;SET NAMES &#8216;latin2&#8217; COLLATE &#8216;latin2_croatian_ci&#8221;. The production config had SET NAMES utf8, but development didn&#8217;t. Oh boy. <\/p>\n<p>Ok so now I had a real mess &#8211; utf8 tables with utf8 data in them stored as re-encoded latin2 but in utf8!? I tried converting the dump from ISO8859-2 to UTF8 but that just made things worse, some of characters now used 6 bytes which isn&#8217;t good. The solution was pretty straight forward. I converted the dump from UTF8 to ISO-8859-2 and I got real UTF8 again. I imported the converted dump and changed for good config files to &#8220;SET NAMES &#8216;utf8&#8217; COLLATE &#8216;utf8_general_ci&#8221;.<br \/>\niconv to the rescue:<\/p>\n<p>[ftf w=&#8221;400&#8243; h=&#8221;150&#8243;]<br \/>\nmysqldump -u root -p db > weirdo-dump.sql<br \/>\niconv -f UTF-8 -t ISO_8859-2\/\/TRANSLIT weirdo-dump.sql > dump-latin2-aka-realUTF8.sql<br \/>\nmysql -u root -p db < dump-latin2-aka-realUTF8.sql[\/ftf]\n\n<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This one is a keeper, have to write it down somewhere in case I forget it by tomorrow morning. :) Today I was doing routine deploy of one of our new websites to production server. Everything went smoothly, except the fact I couldn&#8217;t see any of Croatian letters on the site texts (which were stored&#8230;<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1,3,7],"tags":[],"_links":{"self":[{"href":"https:\/\/www.nivas.hr\/blog\/wp-json\/wp\/v2\/posts\/215"}],"collection":[{"href":"https:\/\/www.nivas.hr\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.nivas.hr\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.nivas.hr\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.nivas.hr\/blog\/wp-json\/wp\/v2\/comments?post=215"}],"version-history":[{"count":0,"href":"https:\/\/www.nivas.hr\/blog\/wp-json\/wp\/v2\/posts\/215\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.nivas.hr\/blog\/wp-json\/wp\/v2\/media?parent=215"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.nivas.hr\/blog\/wp-json\/wp\/v2\/categories?post=215"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.nivas.hr\/blog\/wp-json\/wp\/v2\/tags?post=215"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}