forum

Andrew Cunningham lang.support at GMAIL.COM
Tue Feb 26 23:45:04 UTC 2008


Hi Bill,

I'd be inclined to go the other way, instead of <U+00E1 U+0328>, I'd
use <U+0105 U+0301>. Has the benefit of being normalised according to
Unicode Normalization Form C. And would probably get better rendering
that was as well.

And doesn't matter whether there is a precomposed version or not. part
of the limitation is input systems. Take a language like Vietnamese as
an example. Every possible character exists as a single precomposed
character in Unicode. Then look at what Microsoft's keyboard layout
does. Its keyboard system isn't sophisticated enough, so it will
generate precomposed characters for unique vowels and would then using
combining diacritics for the five tone marks. Where as third party
software using only precomposed characters.

Assuming precomposed Apachean characters were available. Using MSKLC
to create a keyboard layout, it would be easier using non-precomposed
characters. Using something like Keyman you'd have a choice.

If data and search strings are normalised before attempting a match
... searching shouldn't be too much of a problem. problem is most
commercially viable languages do need it, so developers don't bother
including such features.

But as Bill indicated, there are font issues, and there are font
rendering and text layout issues.

Anyone on Windows XP SP2 (if you've kicked in complex language
support), Vista or Linux (using Gnome with the graphite integration
with Pango) should be fine. Its just a case then of having the right
OpenType or Graphite fonts.

The MacOS is more problematic because Apple uses AAT fonts and ATSUI,
but many application developers try to roll out and use their own
OpenType solutions instead. Makes for a lot of confusion.

And Adobe, regardless of the platform use their own font rendering and
text layout libraries.

On older versions of windows, you get mixed results but there are a
couple of options.

As far as i can tell the only real issue with Apachean languages, is
which Unicode codepoint to use for the glottal.

Everything else is relatively straight forward.

I've started compiling notes for our African clients who have many of
the same issues, re fonts and font rendering. I've started, needs a
lot of work, not complete by a long shot. But what little is there may
be of interest.

I need to develop notes on treating Latin as a complex script, esp
with programs like Microsoft Word. As soon as oyu start using
combining diacritics, its complex script, so complex script font
options kick in rather than the standard font options. fun on some of
the newer versions of Word.

http://www.openroad.net.au/dev/wiki/doku.php/african/start

hopefully get time to work on it next week.


Andrew

-- 
Andrew Cunningham
Andrew Cunningham
Vicnet Research and Development Coordinator
State Library of Victoria
Australia

andrewc at vicnet.net.au
lang.support at gmail.com



More information about the Ilat mailing list