Alphabetizing

Arnold Zwicky zwicky at STANFORD.EDU
Tue Nov 8 15:52:42 UTC 2011


On Nov 8, 2011, at 6:10 AM, Charles C Doyle wrote:
>
> Yesterday my daughter-in-law called me with a question about my third-grader grandson's homework.  The assignment was to alphabetize a list of words, and the list included the four items girl/girl's/girls/girls'.  (My daughter-in-law made clear than both the academic career of my grandson and the family's standing in the community were at stake, since the parents of the other third-graders were also depending on my answer.)
>
> I failed.  I could tell her that there exist various styles of alphabetizing, that certain traditional "rules" obtain, one of which is "Ignore apostrophes"--but the rules I am aware of don't fully address the case at hand.  I could tell her that if the Microsoft Corporation is asked to "sort" the words alphabetically, they will appear in the order in which I have listed them above, which seems reasonable--but not, as far as I can determine, "authoritative."
>
> Any suggestions?  (I don‚t recall that third grade used to be this hard!)

this strikes me as an absurd assignment for any grade, but certainly for the third grade.

the assumption seems to be that there is a single right way to do alphabetization, while in fact there are many competing styles of alphabetization, with accompanying "rules".  letter-by-letter vs. word-by-word?  disregard capitalization or order upper case before lower case or order lower case before upper case?  treat numerals as coming before alphabetic characters, or after them, or as if they were spelled out in letters?  disregard punctuation or order punctuation marks before alphanumerics (or after them)?  treat the prefixes Mc and Mac as equivalent or as ordered letter-by-letter?  disregard internal spaces, or extend "nothing before something" to the case of internal spaces?  and so on.

i believe that the Microsoft ordering is character-by-character, following ASCII order: as a result, nothing comes before something, and punctuation marks (like the apostrophe, ASCII 39) come before alphabetic characters (which start at ASCII 65, with upper case before lower case).  this gives the ordering of the four words above.

this scheme is definitely *not* traditional, but it has the virtue of always giving a clear answer without human judgment; it's eminently automatizable.  the results are not always attractive; for instance, the algorithm doesn't disregard initial _the_, _a_, or _an_ in titles, since these are just character sequences.

The traditional rule "ignore punctuation marks" doesn't discriminate between _girls_, _girl's_, and _girls'_, so that either you'd have to tolerate random orderings or call a special rule into play just for cases where the general rule fails to provide a unique ordering.

arnold

------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org



More information about the Ads-l mailing list