forum

William J Poser wjposer at LDC.UPENN.EDU
Wed Feb 27 11:27:03 UTC 2008


The "Combining Grapheme Joiner" U+034F is not used to create digraphs.
If you want a sequence of characters to be treated as a single letter, you've
got to declare that somewhere - there is nothing in the text itself that
has that effect. The "Combining Grapheme Joiner" comes into play when
there is an ambiguity as to the treatment of a character sequence. For example,
in Welsh <ll> may represent two /l/s, as in "Williams", or a single sound,
a voiceless lateral fricative, as "Lloyd". The Welsh convention in this case
is that when <ll> represents a single sound it should be treated as a digraph
but that when it represents /ll/ it should be treated as a sequence of
two <l>s. (Curiously, this convention does not apply to all ambiguous
letter pairs - <ng> is supposed to be treated as a single letter even when
it represents the sequence of two sounds as in "Bangor". I must say that I
find this inconsistency very odd.) The problem is, then, that in ordinary
Welsh text there is no distinction between mono-segmental and bi-segmental
<ll> and so no way for a sorting program to know what to do. A bi-segmental
<ll> sequence may be distinguished from a mono-segmental sequence by
putting a CGJ code between the codes for the two <l>s. In other words, it
serves to indicate that a sequence is NOT a digraph in situations in which
there is no visual distinction. (Compare this to the Catalan treatment of
<ll>, where <ll> represents the palatal lateral and geminate /l/ is written
with two <l>s separated by a raised dot. CGJ is equivalent to the Catalan
raised dot, but invisible.)

Bill



More information about the Ilat mailing list