[Lexicog] Tone languages in Toolbox/Shoebox
Mike Maxwell
maxwell at LDC.UPENN.EDU
Tue Apr 11 19:29:56 UTC 2006
neduchi at netscape.net wrote:
>> In what sense do these programs handle the dotted vowel + tone mark as
>> two characters? Are they displaying the tone marks to the right of the
>> vowel, or is the problem s.t. more subtle than this?
>
>
> Please have a look at the link below. It is a sample of an arbitrarilly
> tone-marked Igbo text I put together with Andrew Cunningham. You can
> switch between the three different fonts used: Arial Unicode MS,
> CODE2000 and Doulos SIL. Try out the fonts and observe the location of
> the sub-dots and the tone marks:
> http://www.openroad.net.au/languages/african/igbo/sample.html
>
> I would like to see the sub-dotted and tone-marked characters
> 'compactly' displayed with the tone marks as ONE composite whole and
> not as two or three estranged neighbours.
OK, now I'm beginning to understand the problem you're seeing!
Yes, this appears to be a rendering problem, not a Unicode problem per
se. That is to say, either there's a problem with the font, or with the
technology that displays the font (I'm not sure which).
Let me summarize the rendering issues I see, and let me know if I'm
missing s.t.
First, the accent is much too low over upper case vowels. It's also too
far to the left over the lower and upper case 'i/I' (these appear in the
sample paragraph, but not in the list of sample characters). Also, the
dot under the upper case 'U' is too far to the right (both in the
undotted U in the para, and the dotted U in the sample chars), and the
dot under the lower case 'i' is much too far to the left (in fact,
almost under the preceding letter).
Also, the upper case N with grave (U+01F8) shows up as a box in many
apps (it looks OK in Firefox).
(I also see a dot _over_ n/N in the sample chars--is that correct?)
Some of these problems would be solved by using pre-composed chars.
(That is, many of the chars in the sample para appear to be in NFD
normlization, rather than NFC.) For example, the grave vowels without
dots would probably look just fine if they used the pre-composed
equivalents. (If you are going to use a decomposed character, the grave
accented 'i' should probably be produced with the dotless-i, U+0131.
This unfortunately doesn't solve the problem of the grave accent being
too far to the left.)
The dot under problem is more difficult, because there are few
pre-composed dot-under characters (maybe none, I can't remember), and
certainly no pre-composed characters having both the dot under and an
acute or grave. But the fact that the dots on these characters don't
show up in the right position is a font/rendering issue, which hopefully
will get fixed. FWIW, the problem is noted at the wikipedia page
(http://en.wikipedia.org/wiki/UniCode#Ready-made_versus_composite_characters).
Of course that's no help right now...
In sum, this appears to me to be a rendering issue, not a Unicode issue
per se. It also appears to be a somewhat different question than the
original posters brought up, who I believe were asking for tools to do
phonology and/or morphology.
Mike Maxwell
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/lexicographylist/
<*> To unsubscribe from this group, send an email to:
lexicographylist-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the Lexicography
mailing list