forum
Cunliffe D J (AT)
djcunlif at GLAM.AC.UK
Wed Feb 27 11:48:41 UTC 2008
Hello All,
Thanks for that clarification Bill - my coffee starved, exam paper
writing brain had hallucinated a "not" when I was reading the WLB
document. As you might be able to tell, this isn't really my field!
Given that there isn't a "digraph joiner" - could there be one - or is
this just a cludgy solution to a problem that should be solved another
way?
My Welsh isn't very good, but I am not sure I understand the point about
<ng> always being treated as a single letter. Bangor is used as an
example of when not to treat "ng" as <ng> in the WLB document.
Thanks also to Andrew for pointing out that the Conrad Taylor report is
indeed rather long in the tooth.
I'm off now to meditate on the expression "A little knowledge is a
dangerous thing" :-)
Daniel.
-----Original Message-----
From: Indigenous Languages and Technology
[mailto:ILAT at LISTSERV.ARIZONA.EDU] On Behalf Of William J Poser
Sent: 27 Chwefror 2008 11:27
To: ILAT at LISTSERV.ARIZONA.EDU
Subject: Re: [ILAT] forum
The "Combining Grapheme Joiner" U+034F is not used to create digraphs.
If you want a sequence of characters to be treated as a single letter,
you've
got to declare that somewhere - there is nothing in the text itself that
has that effect. The "Combining Grapheme Joiner" comes into play when
there is an ambiguity as to the treatment of a character sequence. For
example,
in Welsh <ll> may represent two /l/s, as in "Williams", or a single
sound,
a voiceless lateral fricative, as "Lloyd". The Welsh convention in this
case
is that when <ll> represents a single sound it should be treated as a
digraph
but that when it represents /ll/ it should be treated as a sequence of
two <l>s. (Curiously, this convention does not apply to all ambiguous
letter pairs - <ng> is supposed to be treated as a single letter even
when
it represents the sequence of two sounds as in "Bangor". I must say that
I
find this inconsistency very odd.) The problem is, then, that in
ordinary
Welsh text there is no distinction between mono-segmental and
bi-segmental
<ll> and so no way for a sorting program to know what to do. A
bi-segmental
<ll> sequence may be distinguished from a mono-segmental sequence by
putting a CGJ code between the codes for the two <l>s. In other words,
it
serves to indicate that a sequence is NOT a digraph in situations in
which
there is no visual distinction. (Compare this to the Catalan treatment
of
<ll>, where <ll> represents the palatal lateral and geminate /l/ is
written
with two <l>s separated by a raised dot. CGJ is equivalent to the
Catalan
raised dot, but invisible.)
Bill
More information about the Ilat
mailing list