[Lexicog] Sorting

Koontz John E john.koontz at COLORADO.EDU
Sun Mar 21 19:24:13 UTC 2004


On Sat, 20 Mar 2004, Mike Maxwell wrote:
> This kind of sort program is dependent on the order of the code points in
> your encoding being standard.

Cases where collating order of a charcter set encoding match the desired
sorting order are probably more the exception than the rule, though
character sets attempt within reason to match some sort of default order.

There are probably various ways to handle sorting, but Bob Hsu used to
discuss it in terms of sorting handles, which are transformations of the
sorted elements into character strings for which the collating sequence
does match the desired sorting order.  For example, if you want upper and
lower case to be treated the same, convert upper case to lower case in the
handle.  If you want a-acute to be treated like a, convert a-acute to a in
the handle.  If you want a-acute to be treated like a, except that where
two words differ in that respect, a-acute follows a, then convert a to
a-acute, but append a 1 to the end of the word for each a and a 2 for each
a-acute.  If you want ch to be treated as a single letter following c, map
a to a, b to b, c to c, ch to d, d to e, etc.

Ideally the sorting program will generate these handles on the fly as it
needs them, based on your sorting rules, but, if you don't have access to
a clever sorting program you can always create the handles manually
yourself and make sure the sorting program uses them to sort with rather
than the nominal key.  You have to delete them from some kinds of output,
of course.




Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/lexicographylist/

<*> To unsubscribe from this group, send an email to:
     lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/



More information about the Lexicography mailing list