AW: Corpora: non-alphabetic language databases
Thomas Schmidt
thomas.schmidt at uni-hamburg.de
Thu Nov 30 11:59:38 UTC 2000
The unicode standard is indeed a promising solution for representing
non-alphabetic characters of any kind. Concerning the original question: I
don't know much about sign languages, but I wouldn't be surprised if the
unicode consortium has taken or will take these into account.If they don't,
the design of the unicode standard leaves room for user-defined symbols, so
it should be possible, for instance, to code alphabetic and sign language
symbols within one document.
The unicode homepage is on
http://www.unicode.org/
-----Ursprüngliche Nachricht-----
Von: Simon G. J. Smith [SMTP:smithsgj at eee.bham.ac.uk]
Gesendet am: Donnerstag, 30. November 2000 12:34
An: corpora at hd.uib.no
Betreff: Re: Corpora: non-alphabetic language databases
Paula
Have a look at www.chinesecomputing.com
Are you a student of one of these languages? Take a look at a website from
one of the countries, without character-reading software running, and you
will see that each character is represented by two ASCII characters -
usually obscure things like ^ or ` and others that are not on the qwerty
keyboard at all.
My understanding is this: order of database entry is not based on any
phonetic system, nor on any arrangement of radicals or character
components, but on a standard (for Chinese, usually one of Big-5 or GB
(Guo-Biao)) which maps each character on to an arbitrary pair of ASCII
characters. With the advent of the Unicode standard, a one-to-one mapping
is also now possible, but implementations are rare.
I'm not an expert: perhaps there's one around who would care to add their
comments?
More information about the Corpora
mailing list