forum

William J Poser wjposer at LDC.UPENN.EDU
Wed Feb 27 17:56:31 UTC 2008


As I understand it, in Welsh <ng> sometimes represents the velar nasal,
as in <Cangen>, and sometimes the sequence of velar nasal followed by
voiced velar stop, as in <Bangor>. This is just like the difference
in English between <singer> and <finger>. So the sequence <ng> is ambiguous
in the same way that <ll> is: both can represent either a single sound
or a sequence of two sounds. 

One might think, therefore, that <ng> and <ll> would be treated the same
way. However, according to the "Bilingual Software Standards and Guidelines",
while <ll> sorts differently depending on whether it represents one sound or
two, <ng> is always treated for sorting purposes the same way, whether it
represents one sound or two. There are several other digraphs treated like
<ng>, and several that are treated like <ll>. Why the two sets behave differently
is a mystery to me, though perhaps it makes sense to an expert on Welsh or
its history.

As for a "digraph joiner", I don't know if that has ever been considered.
I can't speak for the Unicode Consortium, but it seems to me that both
the "Combining Grapheme Joiner" and its putative opposite, the "digraph joiner",
are contrary to the spirit of Unicode.  That is, the basic idea in Unicode
is to encode only visibly different characters, not "markup". Both CGJ
and the "digraph joiner" are used essentially as markup. That is, they are
used to provide information not present in normal text, and not visible,
to software processing the text. I don't know the history of the inclusion
of CGJ in Unicode, but since it seems to run contrary to the general approach,
I suspect that it was in a sense a mistake, either something that got past
review that shouldn't, or something put in due to strong pressure from an
interested party. 

Bill



More information about the Ilat mailing list