Eric Brunner-Williams in Portland Maine
brunner at NIC-NAA.NET
Sun Sep 23 13:10:25 UTC 2001
> > There are more than one code points allocated for the character "A",
> > e.g.,
> > code point U+0041, U+0391, U+0410.
> true, if you are from the camp that believes that teh Latin "A" is exactly
> the same character as Cyrillic and Greek characters with the same shape.
> and that would be like trying to collapse all the indic langauges into a
> single script.
> or the copmpression of some ethiopic characters with some japanese kana, or
> even the compression of the two kana scripts.
> a political nightmare.
In the IDN WG of the IETF, which I've been contributing to (or against) for
about a year, there is the issue of "misleadingly similar characters". I do
not happen to think this issue as important as extending the current label
space from "-", "0"-"9", "A-Za-z" (aka "LDH") the ASCII subset, but milage
Incidently, I personally think that characters have properties other than
their glyph, e.g., sort order. I do OS work, the UniCadettes do printers.
> the current reality is that its very unlikely that additional precomposed
> characters will be encoded, or at leats thats my current understanding.
> Although it isn't relevant to most languages, it is relevant to a range of
> african languages for instance.
As I mentioned above, characters have properties other than appearance. For
Central African Syllabics (Dr. Nii Quanor's project, and incidently host in
Ghana of next Spring's ICANN meeting), the offered encoding is likely to be
in two distinct code pages -- one in the Latin pages, for characters with
glyphs similar to or borrowed from a European language, and one elsewhere,
for characters with unique glyphs. This has non-trivial consequences to the
authors of collation (sort) algorithms for CAS, as for the authors of ASCII
Compatible Extension (ACE) algorithms (one fundamental technique) for DNS
labels containing names in CAS characters.
> and the situation was complicated initially, when microsoft decided mot to
> support the latin script in early versions of Uniscribe.
The relations between the UTC, SC2, SC22, and national standards bodies over
the past decade is more complex than simply propriatary advantaging policy
by a vendor.
> Over time it has become somewhat easier to support minority languages in
> Unicode. Although, unfortunately, it generally requires the most recent
> versions of the operating systems.
At last, a tail fin that has legs.
Eric (a once and future OS developer)
Endangered-Languages-L Forum: endangered-languages-l at cleo.murdoch.edu.au
Web pages http://cleo.murdoch.edu.au/lists/endangered-languages-l/
Subscribe/unsubscribe and other commands: majordomo at cleo.murdoch.edu.au
More information about the Endangered-languages-l