Apr 28

We use MySQL database for pretty much everything now days. It's de-facto standard for horizontally scaled web sites and it's used by biggest players in the industry. But one thing that is lacking, and which is very important for our regional market is proper Croatian collation support for utf8 charsets. Without it, MySQL server can't be considered choice for eg. government migration to open-source platform in near future.

We tried implementing it on our own for couple of times, but without any luck. The problem lies in fact that Croatian language (Serbian and Bosnian too) have digraph characters (single characters consisted of two characters - lj, nj and dž). And without proper support for those, we will never be able to sort things right (a-b-c-č-ć-d-dž-đ-...i-j-k-l-lj-m-n-nj-...u-v-z-ž)

What does it take to implement Croatian utf8 collation? It takes modifying source code beyond our knowledge (we tried creating new collation with Vietnamese as a base for digraphs as a pair of basic latin letter + accented Latin letter).

AFAIK the countries which would benefit from the same implementation (alongside Croatia) are: Bosnia, Serbia (for latin charset) and Monte Negro (for latin charset). So please, if you can - spread the word! I think that support for this would be appreciated by thousands of MySQL developers in our region who are now forced to use hacks from '90 to get correct sort order. :)

I've submitted S4 feature request to MySQL - http://bugs.mysql.com/44523 and
I've posted a feature request/proposition on official MySQL dev forum, so we will see what happens. It certainly wouldn't harm if you would sign in to bugs.mysql.com and MySQL dev forum and reply to my feature request and topic with "Yes please" or something similar. It's free, and it can make difference. :)

5 Responses to “MySQL feature request/proposition: Croatian utf8 collation (utf8_croatian_ci)”

  1. Berislav Lopac Says:

    The solution already exists here; the only issue is “dž”, as only two-byte combinations are allowed. I have personally tested the solution and it works as advertised (apart the abovementioned “dž”, which was not an issue in my case).

  2. seven Says:

    Hi Berislav,
    Look who created that thread (me) and problems we’ve encountered with this solution – http://forums.mysql.com/read.php?103,192187,216993#msg-216993

    Believe me, that’s not working as we would want it to. The only solution is to implement proper support.

  3. puzz Says:

    I agree and added a comment on your feature request – first but I hope not the only one :)

  4. seven Says:

    much obliged puzz!

  5. Nivas.hr blog » Blog Archive » “Imamo Hrvatsku!” – MySQL patch which implements full Croatian ordering in utf8_croatian_ci and ucs2_croatian_ci collations Says:

    [...] of months ago, I started an open initiative to finally add support to MySQL for proper ordering using Croatian alphabet by [...]

Leave a Reply