forum

Cunliffe D J (AT) djcunlif at GLAM.AC.UK
Wed Feb 27 10:21:48 UTC 2008


Hello All,

Just a small contribution to this fascinating insight into language
diversity, from the Welsh language, Cymraeg.

A particular challenge faced when dealing with Welsh are the digraph
letters, each of which is composed of two characters - ch, dd, ff, ng,
ll, ph, rh, th.

The Welsh Language Board suggests that the 'Combining Grapheme Joiner'
can be used to "stick" the two characters together. They note that this
is fairly obscure!

Do any other languages face this problem, has the 'Combining Grapheme
Joiner' actually been built into any applications?

There are a number of interesting issues around sort orders, how to sort
Welsh words and English words together (differently for different
audiences) and character counts. If you are interested, there is an
excellent document discussing these issues and wider issues around the
design of bilingual software, from the Welsh Language Board: 
http://www.bwrdd-yr-iaith.org.uk/cynnwys.php?pID=109&langID=2&nID=2063

Cheers,

Daniel.


-----Original Message-----
From: Indigenous Languages and Technology
[mailto:ILAT at LISTSERV.ARIZONA.EDU] On Behalf Of William J Poser
Sent: 27 Chwefror 2008 00:15
To: ILAT at LISTSERV.ARIZONA.EDU
Subject: Re: [ILAT] forum

Andrew,

I agree except that it DOES matter whether a character is available
precomposed. The problem of multiple representations is indeed solved
by the use of normalization, though it is taking a while for
normalization
libraries to become available for all languages and for all software
that should be using them to use them. But even with normalization,
it is an additional pain to process text in which some characters
require two or three codepoints while some require only one. Not that
it can't be done, but it makes life more difficult.

Bill



More information about the Ilat mailing list