forum

Mia Kalish MiaKalish at LEARNINGFORPEOPLE.US
Thu Feb 28 21:38:42 UTC 2008


Hi, Andrew, 
You wrote: 
This is a software issue, collation sorting routines should be able to
work on multiple characters, not just single characters.

If data is normalised, and you have software that has properly
implemented Unicode collation and allows you to specify language
specific collation, it should be possible to sort a letter that
includes a combining diacritic correctly, after all some languages
need to be able to sort digraphs and trigraphs correctly as well.

Its a software limitation. Not a Unicode issue.
------------------------------------------------------------------

I don't think this is a multiple character issue, I think it is a sequencing
issue. I don't know if "insert" puts the characters into ascending sequence.
I believe that the codes are stored as they are inserted. Also, there is the
cultural-linguistic intersection: the internal numeric sequence may not be
what is wanted for the language; for example, in Apachean, the glottal has
to sort first. So overall, I think the concept "correctly" is culturally
dependent. 
.....................................
More from Andrew: 
Like wise a font design issue more than a Unicode issue.

I tend to distinguish between things that the UTC need to do to get
things right and things that developers haven't got right (including
font developers).
---------------------------------------------------------------------
I'm a little confused on this one. If we imagine power users who are using
the Combine function in UC to create composite characters, then there aren't
really any font developers directly involved. 
What I noticed when I was modifying fonts is that you cannot directly copy
and  paste the components - which is essentially the same as the Combine
function. 
I don't know how you would create a diacritic that would be in the right
position for the wide "a" and also - without horizontal adjustment - in the
right position for the "i". 
There's also the serifs issue: Serif vowels tend to be much more regular in
size than non-Serif vowels . . . that isn't really a font design issue
either, unless we want to eliminate one or the other (just kidding). 
Can you explain what you see here? How you see the designers and software
people solving this issue so it becomes automated? 

Thanks, 
Mia 


-----Original Message-----
From: Indigenous Languages and Technology [mailto:ILAT at LISTSERV.ARIZONA.EDU]
On Behalf Of Andrew Cunningham
Sent: Tuesday, February 26, 2008 4:54 PM
To: ILAT at LISTSERV.ARIZONA.EDU
Subject: Re: [ILAT] forum

Hi Mia,

On 27/02/2008, Mia Kalish <MiaKalish at learningforpeople.us> wrote:

>   So when you go to sort, you get
>  a weird sequences of output sequences, and the average user can't grok
>  what's happening.

This is a software issue, collation sorting routines should be able to
work on multiple characters, not just single characters.

If data is normalised, and you have software that has properly
implemented Unicode collation and allows you to specify language
specific collation, it should be possible to sort a letter that
includes a combining diacritic correctly, after all some languages
need to be able to sort digraphs and trigraphs correctly as well.

Its a software limitation. Not a Unicode issue.

>  As for your third paragraph about the Unicode consortium: It is pretty
much
>  an exact rendition of the conversation(s) I had with them specifically
about
>  this issue. They seem to really, truly believe that if the glyph looks
>  right, the character is also right (character = glyph + code).
>  Now, is a cedilla different from the little hook? Whoever he is, he
doesn't
>  go in the same horizontal location for all the vowels. When we develop
the
>  glyphs by hand, even if we use the Unicode/Font glyphs that are already
>  existing, we have to eyeball them in so they look nice. It also makes a
>  difference whether the fonts have serifs or not (que serif, serif,
whatever
>  will b, we'll c ... )

Like wise a font design issue more than a Unicode issue.

I tend to distinguish between things that the UTC need to do to get
things right and things that developers haven't got right (including
font developers).



Andrew
-- 
Andrew Cunningham
Vicnet Research and Development Coordinator
State Library of Victoria
Australia

andrewc at vicnet.net.au
lang.support at gmail.com



More information about the Ilat mailing list