[Lexicog] Sorting; Shoebox; Hsubox

Koontz John E john.koontz at COLORADO.EDU
Mon Mar 22 23:11:34 UTC 2004


On Mon, 22 Mar 2004, David Frank wrote:
> I've learned one term from you -- sorting handle -- and now another one:
> fold point.

Handle I learned from Bob Hsu - a professor of linguistics at the U of
Hawaii who has since retired to Washington state.  I don't know if he made
it up or learned it the literature on sorting.  Bob had a nice set of
tools written in Spitbol that were much used in producing Pacific Area and
Salish dictionaries in the 1970s - 1980s.  After Shoebgox came out some
people called Dr. Hsu's tools Hsubox.

Fold point I made up, but I'm not sure I don't have some sort of
subconscious model for it that I've forgotten.

> But an automatic way of doing this "folding" was not what we were
> interested in. For our purposes in working on a dictionary of St. Lucian
> Creole, whenever we had a phrase, we had a judgment call to make, to
> determine which was the most salient word in the phrase, and use that as
> the basis for sorting. We could adjust the alphabetic key in the
> database record so that it sorted on the right word in the phrase.

So you could use a system of "invisible markers," but not some form of
regular expression or other matching.

> Back when I was doing automated sorting, I had a sort routine I had written
> that sorted on the nonprinting alphabetic key rather than the headword.
> Coming up with a programming routine for sorting on something other than the
> first field in a record is not too difficult.

> I bet there are programs available for sorting on any specified field in a
> set of records. If anyone needs such a program, let that fact be known and I
> bet there are subscribers to this list who can tell us where to find them.

I used something called OptTech Sort (name by recollection) that we had
gotten to make use of the Hsu tools, which requited Spitbol and a powerful
sorting utility.  I seem to recall that OptTech would let you sort
multiline records, though you had to insert some sort of interrecord
marker and then take it back out.  The problem was that you had to be able
specify the location of the key in absolute terms or relative to special
character (a field marker).  One gimic was to copy the key to the start of
the record before doing the sort and then delete it.  So quite a lot of
massaging was called for.

When Shoebox came out - it used to sort only on the headword, I think, and
that may still be a limitation - I started extracted a copy of the desired
field to the front of the record, imported into Shoebox, sorted, and then
deleted the extra field from the head of the record and reimported into
Shoebox. The same dodge as with OptTech Sort, and kind of round about, but
the code is all easy to write than a full fledged sort with all the
features of Shoebox's sort.

> I don't know either whether Shoebox (or Toolbox) can sort on something other
> than the headword field in a dictionary database. I also don't know whether
> sorting can be turned on and off. In my brief, ill-fated encounter with
> Shoebox, the whole database was sorted according to the key word and I
> didn't know how to turn it off. I bet somebody could tell us.

Sounds like that would be a useful post, but maybe this behavior is a
feature, as they say.

> I use mailmerge format ...

I'm not sure what this is.  A format for creating merged letters for Word?

Hsu's format was very much like SFM, but, as I recall, it used unslashed
fieldnames and put a period before the keyfield's name.  The keyfield had
to be first.  He had a concept of subentries - looking forward to XML's
nesting ability - that had multiple dots before the subfield key.

One feature I very much liked in Hsubox was the tool that would construct
an index from crossreferences to headwords.  Crossreferences were marked
in the text of entries with an asterisk.  This was an early approach to
having xml markers like <xref>...</xref> around words or phrases in the
record.

Other invaluable tools would produce a census of fieldnames (for finding
anomalies in field structures) or a tabular report with certain fields in
certain width columns, very useful for consistency editing.



------------------------ Yahoo! Groups Sponsor ---------------------~-->
Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark
Printer at MyInks.com.  Free s/h on orders $50 or more to the US & Canada.
http://www.c1tracking.com/l.asp?cid=5511
http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/HKE4lB/TM
---------------------------------------------------------------------~->


Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/lexicographylist/

<*> To unsubscribe from this group, send an email to:
     lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/



More information about the Lexicography mailing list