Corpora: non-alphabetic language databases

Mike Maxwell mike_maxwell at sil.org
Thu Nov 30 13:45:13 UTC 2000


>3.) SIL international are developing a font rendering
>engine called Graphite which should be able to be
>embedded in corpus processing systems.

As an SIL member, but not one directly involved with Graphite (or non-Roman
font rendering systems in general), I'll just say a couple things:

1. For info on the Graphite project, see
http://www.sil.org/computing/graphite/.

2. The question of writing systems for signed languages came up a couple
weeks ago.  As I understand it, the de facto standard for writing among
native "speakers" of American Sign Language (as opposed to researchers) is
"Sign Writing"; see http://www.signwriting.org/.  The developers there have
a (propietary) system for keying in and rendering (displaying) Sign Writing.
I don't believe that system is "in" Unicode yet, although I could be wrong.

3. Font rendering problems come up in a variety of "non-Roman" writing
systems, including alphabetic ones--namely, in any writing system where the
shape or positioning of glyphs (the displayed form of characters) is context
sensitive, or the direction of writing is not strictly left-to-right and
top-down.  This includes things like the two forms of lower case sigma in
Greek (one word-final, the other elsewhere), Arabic-based systems
(right-to-left), systems in which vowel letters appear above or below the
consonant letters (Massoretic Hebrew, I think, and many alphabetic systems
of SE Asia), etc.  Even the IPA transcription system has some of
this--diacritics which otherwise appear below a base character instead
appear above characters that have a descender (e.g. the voiceless symbol
when used with the engma).  In widely-used writing systems (e.g. Greek and
Arabic), these problems have been addressed at the operating system level;
thus, there are Middle Eastern versions of Microsoft Windows.  The problem
is worse with writing systems which are not commercially viable, for which
there are at present few solutions.

4. Apple has some solutions for the above issues, and Microsoft is now
putting more effort into it as well.  But at present, I think it's safe to
say that no one covers all the writing systems.

5. Treat all the above as non-official, uninformed speculation on my part
:-).  In case of disagreement, there will be a recount.

                                         Mike Maxwell
                                         SIL
                                         Mike_Maxwell at sil.org



More information about the Corpora mailing list