[Lexicog] Tone languages in Toolbox/Shoebox

Mike Maxwell maxwell at LDC.UPENN.EDU
Tue Apr 11 19:29:56 UTC 2006


neduchi at netscape.net wrote:
>> In what sense do these programs handle the dotted vowel + tone mark as
>> two characters? Are they displaying the tone marks to the right of the
>> vowel, or is the problem s.t. more subtle than this?
> 
> 
> Please have a look at the link below. It is a sample of an arbitrarilly 
> tone-marked Igbo text I put together with Andrew Cunningham. You can 
> switch between the three different fonts used:  Arial Unicode MS, 
> CODE2000 and Doulos SIL. Try out the fonts and observe the location of 
> the sub-dots and the tone marks:
> http://www.openroad.net.au/languages/african/igbo/sample.html
> 
> I would like to see the sub-dotted and tone-marked characters 
> 'compactly' displayed with the tone marks as ONE composite whole and 
> not as two or three estranged neighbours.

OK, now I'm beginning to understand the problem you're seeing!

Yes, this appears to be a rendering problem, not a Unicode problem per 
se.  That is to say, either there's a problem with the font, or with the 
technology that displays the font (I'm not sure which).

Let me summarize the rendering issues I see, and let me know if I'm 
missing s.t.

First, the accent is much too low over upper case vowels.  It's also too 
far to the left over the lower and upper case 'i/I' (these appear in the 
sample paragraph, but not in the list of sample characters).  Also, the 
dot under the upper case 'U' is too far to the right (both in the 
undotted U in the para, and the dotted U in the sample chars), and the 
dot under the lower case 'i' is much too far to the left (in fact, 
almost under the preceding letter).

Also, the upper case N with grave (U+01F8) shows up as a box in many 
apps (it looks OK in Firefox).

(I also see a dot _over_ n/N in the sample chars--is that correct?)

Some of these problems would be solved by using pre-composed chars. 
(That is, many of the chars in the sample para appear to be in NFD 
normlization, rather than NFC.)  For example, the grave vowels without 
dots would probably look just fine if they used the pre-composed 
equivalents.  (If you are going to use a decomposed character, the grave 
accented 'i' should probably be produced with the dotless-i, U+0131. 
This unfortunately doesn't solve the problem of the grave accent being 
too far to the left.)

The dot under problem is more difficult, because there are few 
pre-composed dot-under characters (maybe none, I can't remember), and 
certainly no pre-composed characters having both the dot under and an 
acute or grave.  But the fact that the dots on these characters don't 
show up in the right position is a font/rendering issue, which hopefully 
will get fixed.  FWIW, the problem is noted at the wikipedia page 
(http://en.wikipedia.org/wiki/UniCode#Ready-made_versus_composite_characters). 
  Of course that's no help right now...

In sum, this appears to me to be a rendering issue, not a Unicode issue 
per se.  It also appears to be a somewhat different question than the 
original posters brought up, who I believe were asking for tools to do 
phonology and/or morphology.

    Mike Maxwell


 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/lexicographylist/

<*> To unsubscribe from this group, send an email to:
    lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 



More information about the Lexicography mailing list