Off-topic Re: Sorting by Case

James E. Clapp jeclapp at WANS.NET
Tue Mar 14 05:45:06 UTC 2000


A. Vine wrote:
>
> The question of uppercase/lowercase order was specifically about
> English though.  This has nothing to do with Unicode, the standard.
> The list is convenient for other i18n issues.

I'm still not sure I fully understand the purpose of the question (you
once explained to me what i18n stands for, but I'm embarrassed to say
I've forgotten)--but from the variety of answers received so far it
seems clear that (a) different sorting schemes are appropriate for
different purposes, and (b) even for a specific purpose, opinions may
differ as to what sort order is best.

If the people on that other list are debating what sort order to
incorporate into applications, I would urge that a considerable variety
of sorting algorithms be made available at the user's option.  I've been
annoyed by word-processing and spreadsheet programs that will sort
things for me, but only in one or two ways chosen by the developer,
neither of which ever seems to suit my needs.

At a minimum, in addition to being able to choose whether upper- or
lowercase letters should come first, one should be able to choose among
(1) ASCII sort, (2) "dictionary" sort (phrases treated as one long
string of letters; spaces ignored), and (3) "telephone book" sort
(word-by-word sort: "Smith Samuel" comes before "Smithe Alice").

But in real life there are more complications than that.  For example,
in some situations you want punctuation marks treated as sortable
characters; in others you want them ignored (so that "phat" will come
before "Ph.D.").  And there's always the problem of numbers, which
occasionally one might want to treat as characters but more often one
would like treated as numbers, so that 2 will come before 10 instead of
the other way around.  And sometimes people would like to put all the
listings beginning with numbers before those beginning with letters, and
sometimes they'd rather put those beginning with numbers at the end.

So my ultimate fantasy (which really should be perfectly doable) is that
(A) a few basic sort options should be choosable with radio buttons,
probably (1) ASCII vs. non ASCII sort, and if non ASCII is chosen, then
(2)(a) dictionary order vs. telephone book order, (b) caps before lc vs.
lc before caps, (c) numbers before letters vs. letters before numbers,
(d) numbers sorted numerically vs. numbers sorted character by
character, and (e) punctuation marks treated as characters vs. ignored;
and (B) a "customize" option be provided as an alternative to the
foregoing pre-programmed choices, whereby the user can specify the
desired order or treatment for each character, including punctuation
marks.

This is off the top of my head, so before sitting down to write the code
I'd want to think it through!  But the point is, any sort algorithm has
to have what amounts to an internal table specifying the sort order;
instead of trying to guess what the user will want or dictate what the
user *should* want, why not provide a nice user-friendly way for the
*user* to specify the sort order?

And since we're talking about Unicode here, the same principle could be
used for a more generalized feature allowing users to dictate their
preferred sort order for characters in other languages as well.

James E. Clapp



More information about the Ads-l mailing list