MySQL feature request/proposition: Croatian utf8 collation (utf8_croatian_ci)

Author: seven April 28, 2009

We use MySQL database for pretty much everything now days. It's de-facto standard for horizontally scaled web sites and it's used by biggest players in the industry. But one thing that is lacking, and which is very important for our regional market is proper Croatian collation support for utf8 charsets. Without it, MySQL server can't be considered choice for eg. government migration to open-source platform in near future.

We tried implementing it on our own for couple of times, but without any luck. The problem lies in fact that Croatian language (Serbian and Bosnian too) have digraph characters (single characters consisted of two characters - lj, nj and dž). And without proper support for those, we will never be able to sort things right (a-b-c-č-ć-d-dž-đ-...i-j-k-l-lj-m-n-nj-...u-v-z-ž)

What does it take to implement Croatian utf8 collation? It takes modifying source code beyond our knowledge (we tried creating new collation with Vietnamese as a base for digraphs as a pair of basic latin letter + accented Latin letter).

AFAIK the countries which would benefit from the same implementation (alongside Croatia) are: Bosnia, Serbia (for latin charset) and Monte Negro (for latin charset). So please, if you can - spread the word! I think that support for this would be appreciated by thousands of MySQL developers in our region who are now forced to use hacks from '90 to get correct sort order. :)

I've submitted S4 feature request to MySQL - http://bugs.mysql.com/44523 and
I've posted a feature request/proposition on official MySQL dev forum, so we will see what happens. It certainly wouldn't harm if you would sign in to bugs.mysql.com and MySQL dev forum and reply to my feature request and topic with "Yes please" or something similar. It's free, and it can make difference. :)

Author
seven
CEO/CTO at Nivas®
Neven Jacmenović has been passionately involved with computers since late 80s, the age of Atari and Commodore Amiga. As one of internet industry pioneers in Croatia, since 90s, he has been involved in making of many award winning, innovative and successful online project. He is experienced full stack web developer, analyst and system engineer. In his spare time, Neven is transforming retro-futuristic passion into various golang, Adobe Flash and JavaScript/WebGL projects.

    5 thoughts on “MySQL feature request/proposition: Croatian utf8 collation (utf8_croatian_ci)”

  • The solution already exists here; the only issue is “dž”, as only two-byte combinations are allowed. I have personally tested the solution and it works as advertised (apart the abovementioned “dž”, which was not an issue in my case).

  • Hi Berislav,
    Look who created that thread (me) and problems we’ve encountered with this solution – http://forums.mysql.com/read.php?103,192187,216993#msg-216993

    Believe me, that’s not working as we would want it to. The only solution is to implement proper support.

  • I agree and added a comment on your feature request – first but I hope not the only one :)

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>