[Lexicog] UNICODE

Mike Maxwell maxwell at LDC.UPENN.EDU
Tue Sep 13 14:55:31 UTC 2005


Jimm GoodTracks wrote:
> What is the thoughts of those who are well into their dictionary work
> and may be confronted with the task of redoing it all over again in the
> Unicode fonts.

This story is not about my dictionary, but I was a consultant on it.  At 
the LDC, Yiwola Awoyale compiled a large dictionary of Yoruba in Shoebox, 
using a hacked (home-made) font.  I recently wrote a simple encoding 
converter that changed this unique encoding into Unicode.

It wasn't a question of re-doing anything, it was simply a question of 
running the encoding converter over the dictionary and opening the result 
in a Unicode-aware editor (we used Toolbox) to make sure things came 
through correctly.  (As it turns out, there were some difficulties in the 
resulting Unicode, having to do with stacked diacritics that didn't appear 
correctly in the Arial Unicode MS font.  So I modified the converter and we 
ran it again, using a different way of representing the stacked diacritics. 
  For the techies here, the better visual result was obtained with a 
non-normalized Unicode representation.)

The other issue we had to deal with was the keyboard setup.  Yiwola had 
been using one keyboard program, but Toolbox doesn't work with that.  So we 
had to install Keyman, and produce a key mapping that conformed to the way 
Yiwola is used to doing them.  Last I heard there were some other minor 
issues with this, but I expect them to be solved.

 > Is it not unlike the large nations imposing their national language
> on the minority languages, Tagalog, English, Japanese, et.al., on the 
> individual Filipino, the Native American and Spanish/ Chinease Americans
> or the Ainu.   The plan for a standard is well meant, but devaluation
> sets the course for the minority community language to become an
> endangered language, and with that, a whole culture world view and way
> of thinking.  Perhaps it is not the same thing.  

I agree with the last sentence: I don't see standardising on Unicode as 
devaluation in any way.  Quite the opposite: it is a way for minority 
languages to gain access to computational tools despite the fact that the 
languages in question do not have "market value."   So you can use Unicode 
to preserve the minority languages.  It is also a way to avoid splintering, 
where there are different--competing--ways of representing texts in the 
language.  Here's a comment on splintering in Ethiopic encodings (for 
languages like Amharic and Tigrinya):

    The task of describing formatting practices in
    Ethiopia is one on par with describing the shapes
    of clouds in Ethiopia.
   (—http://www.abyssiniacybergateway.net/fidel/l10n/)

That is, in the past it has been difficult to share electronic versions of 
Ethiopic data among different users precisely because there was no 
standard.  When (or maybe if) Unicode becomes a standard for Ethiopic, this 
problem will go away, at least for new documents.

There's of course no reason that Unicode has to be the standard for any 
particular language, but it has the best chance.  There have been other 
attempts to develop standards for a language or for a group of languages; 
some have been successful (e.g. Thai), others have not (ISCII, for Indic 
languages).  But I see no reason not to go with Unicode as a standard.

   Mike Maxwell


------------------------ Yahoo! Groups Sponsor --------------------~--> 
Get fast access to your favorite Yahoo! Groups. Make Yahoo! your home page
http://us.click.yahoo.com/dpRU5A/wUILAA/yQLSAA/HKE4lB/TM
--------------------------------------------------------------------~-> 

 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/lexicographylist/

<*> To unsubscribe from this group, send an email to:
    lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 



More information about the Lexicography mailing list