[Lexicog] Tone languages in Toolbox/Shoebox

Greg Aumann Greg_Aumann at SIL.ORG
Tue Apr 11 12:25:50 UTC 2006


neduchi at netscape.net wrote:
> As I have been made to understand (and as David has also further 
> confirmed in his last mail), the problem lies with the font 
> (developers):
> "What they should do is  design the fonts to 'know how to' correctly 
> combine the special  "combining characters" with the preceding 
> characters. "  
> [http://groups.yahoo.com/group/lexicographylist/message/3016]
> 
> I prefer such a solution, because it should then work well with any 
> Unicode-aware software. It surely would take care of the 
> sorting/searching issues also raised by the initial poster. But I do 
> NOT know how to achieve it!
> 
Not all the blame can be laid at the feet of font developers. There is
also quite a bit to be laid at the feet of the system programmers who
have also been slow to support the complex behaviours required by unicode.

Your example is using latin script and latin script has two complex
behaviours: diacritics and ligatures. In order for these complex
behaviours to be supported there needs to be: 1) information in the
font, 2) a latin shaping engine and 3) the application needs to use the
shaping engine.

As you have been told there is often a problem with 1). But also often
2) is also a problem. 3) is less often a problem, but the more
significant the application (e.g. word processor, browser) the more
likely it is to do its own thing (sometimes for the better sometimes not).

Unfortunately as you have found the state of software and font support
for unicode is not so good. Nevertheless unicode is the best solution
and it is slowly improving. Apart from waiting for it to improve the
other thing to do is to understand the issues and arrange things to
reduce the problems as much as you can.

As far as fonts go Doulos SIL and Charis SIL have all the necessary
information to place diacritics correctly and also for the ligatures. I
think Code 2000 is also pretty good in this respect. Arial Unicode MS
does not and I suspect is unlikely to be upgraded. Microsoft have been
upgrading their other fonts and distributing them with recent versions
of their software but the names have remained the same so I think many
people haven't noticed. Also you will not have seen the benefit of many
of the improvements if your system software doesn't have a latin shaping
engine.

So if you have Microsoft Office 2003 and use Doulos SIL in that then you
will see correct handling of diacritics and ligatures. If you have
Windows XP service pack 2 then you may also see this in other
applications such as Notepad, Wordpad, Toolbox etc. But if you use Word
2004 on the Macintosh then it doesn't handle diacritics and ligatures
correctly. If you use the most recent Gnome desktop on Linux then it
will work correctly. Open office on windows XP SP2 should work but on
Linux it does not. In short the situation is pretty complex and widely
variable. Also the situation for each script is different on each
platform. Thus diacritics in latin script is completely seperate from
diacritics in Arabic script for example.

Also there are often bugs in shaping engines and renderers. Thus
sometimes NFC text is ok but NFD is not even though the standard says
they are meant to be equivalent.

I know this isn't very encouraging but it is better to know the
situation so as to work around it as best as possible.

If you really want to know more gory details the http://scripts.sil.org
website has lots of useful information.

Greg


 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/lexicographylist/

<*> To unsubscribe from this group, send an email to:
    lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 



More information about the Lexicography mailing list