digraphs and sorting

James Crippen jcrippen at GMAIL.COM
Sat Jul 28 23:20:20 UTC 2012


On Wed, Jul 25, 2012 at 7:55 AM, Gary Holton <gmholton at alaska.edu> wrote:
> I know this list doesn't get a lot of traffic, so apologies in advance
> for spamming you with this query.

Not to worry, after all people wouldn’t be subscribed if they didn’t
want to receive things.

> For years I've accepted without
> question the orthodoxy which sorts dictionary entries by digraph
> rather than by single characters.

Good!

> This makes obvious sense, since
> digraphs such as th or even trigraphs such as tł' are single phonemes
> and hence shouldn't be relegated to secondary status within a
> dictionary. On the other hand, we also know that many languages do
> just fine treating digraphs as separate characters for the purposes of
> dictionary sorting (e.g., English has no "th" section; Malay has no
> "ng" section). So, my question is, does anyone know of any usability
> studies -- or even just subjective account --  comparing the relative
> advantages of each approach within a language maintenance situation?

As an English speaker I find it confusing to have to remember any kind
of alphabetical order that isn’t monographic. I think that, for people
who are primarily literate in English (even if it’s not their first
language), following the English pattern of only sorting by monographs
is what is expected. Doing so would obey the Principle of Least
Surprise, meaning that the largest number of users would be least
surprised by the decision to only sort by single letters.

For Tlingit I’ve been using the following alphabetical order:

a (b) ch d e g g̱ h i j k ḵ l (m) n o s t u w x x̱ y ÿ ʼ

Thus a word like dleit ‘snow’ will precede dzeit ‘ladder’ naturally
because l < z. A word like káasʼ ‘algae’ will precede kʼáasʼ ‘missing
tooth’ because á < ʼ. A word like kwáash ‘humpback salmon’ will
precede kʼwáash ‘genital area’ because w < ʼ. Acute accents are either
ignored or sorted after vowels without them. People seem to find this
easy enough to follow, and it’s very easy for me to implement in
various software.

I think that the insistence of sorting polygraphs specially in
languages that don’t have such a tradition or aren’t embedded in a
surrounding one (a local major language) is a conceit of linguists. In
general, people expect to follow the same rules of the major language
because those are the ones they have invested the most time in
learning. Nonlinguists think in terms of ‘letters’ of a language, as
is abundantly obvious to anyone who’s taught an introductory
linguistics course. Linguists on the other hand tend to think in terms
of phonemes, thus treating combinations of letters as unitary. Since
that is not how most people seem to expect, when designing a
dictionary for nonlinguists it’s generally more appropriate to follow
the nonlinguist’s expectations. Linguists are more flexible as well,
so when confronted with such a system they can learn to adapt no
matter how ‘unscientific’ they might think it is. (Such issues are
more a matter of aesthetics than science, and it’s wise to admit this
early on rather than trying to come up with post hoc justifications.)

There’s also a traditional linguistics bias that should be kept in
mind when looking at older work. Not too long ago, the typical
Americanist linguist insisted on using one phonetic symbol for each
phoneme. Thus we had ƛ for tł, c̓ for tsʼ, ʒ for dz, and so forth.
Alphabetizing that sort of system was trivial because it was
inherently monographic. But in the shift to polygraphic
representation, the old analytical habits would refuse to die. People
insisted that tł had to be sorted separately from the t section
because in their minds that was still ‘really’ ƛ even though it was
spelled funny. There’s no real excuse for this kind of reasoning
anymore, since very few linguists actually use symbols like ƛ and c̓
today. The last outpost of this seems to me to be in the Salishan
language family, where some languages have actually had these
ensconced in their official orthographies.



More information about the Athapbasckan-L mailing list