[Lexicog] UNICODE
Mike Maxwell
maxwell at LDC.UPENN.EDU
Tue Sep 13 14:55:31 UTC 2005
Jimm GoodTracks wrote:
> What is the thoughts of those who are well into their dictionary work
> and may be confronted with the task of redoing it all over again in the
> Unicode fonts.
This story is not about my dictionary, but I was a consultant on it. At
the LDC, Yiwola Awoyale compiled a large dictionary of Yoruba in Shoebox,
using a hacked (home-made) font. I recently wrote a simple encoding
converter that changed this unique encoding into Unicode.
It wasn't a question of re-doing anything, it was simply a question of
running the encoding converter over the dictionary and opening the result
in a Unicode-aware editor (we used Toolbox) to make sure things came
through correctly. (As it turns out, there were some difficulties in the
resulting Unicode, having to do with stacked diacritics that didn't appear
correctly in the Arial Unicode MS font. So I modified the converter and we
ran it again, using a different way of representing the stacked diacritics.
For the techies here, the better visual result was obtained with a
non-normalized Unicode representation.)
The other issue we had to deal with was the keyboard setup. Yiwola had
been using one keyboard program, but Toolbox doesn't work with that. So we
had to install Keyman, and produce a key mapping that conformed to the way
Yiwola is used to doing them. Last I heard there were some other minor
issues with this, but I expect them to be solved.
> Is it not unlike the large nations imposing their national language
> on the minority languages, Tagalog, English, Japanese, et.al., on the
> individual Filipino, the Native American and Spanish/ Chinease Americans
> or the Ainu. The plan for a standard is well meant, but devaluation
> sets the course for the minority community language to become an
> endangered language, and with that, a whole culture world view and way
> of thinking. Perhaps it is not the same thing.
I agree with the last sentence: I don't see standardising on Unicode as
devaluation in any way. Quite the opposite: it is a way for minority
languages to gain access to computational tools despite the fact that the
languages in question do not have "market value." So you can use Unicode
to preserve the minority languages. It is also a way to avoid splintering,
where there are different--competing--ways of representing texts in the
language. Here's a comment on splintering in Ethiopic encodings (for
languages like Amharic and Tigrinya):
The task of describing formatting practices in
Ethiopia is one on par with describing the shapes
of clouds in Ethiopia.
(—http://www.abyssiniacybergateway.net/fidel/l10n/)
That is, in the past it has been difficult to share electronic versions of
Ethiopic data among different users precisely because there was no
standard. When (or maybe if) Unicode becomes a standard for Ethiopic, this
problem will go away, at least for new documents.
There's of course no reason that Unicode has to be the standard for any
particular language, but it has the best chance. There have been other
attempts to develop standards for a language or for a group of languages;
some have been successful (e.g. Thai), others have not (ISCII, for Indic
languages). But I see no reason not to go with Unicode as a standard.
Mike Maxwell
------------------------ Yahoo! Groups Sponsor --------------------~-->
Get fast access to your favorite Yahoo! Groups. Make Yahoo! your home page
http://us.click.yahoo.com/dpRU5A/wUILAA/yQLSAA/HKE4lB/TM
--------------------------------------------------------------------~->
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/lexicographylist/
<*> To unsubscribe from this group, send an email to:
lexicographylist-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the Lexicography
mailing list