[Lexicog] Sorting

William J Poser billposer at ALUM.MIT.EDU
Mon Mar 22 22:10:40 UTC 2004


David Frank wrote:

   I bet there are programs available for sorting on any specified field in a
   set of records. If anyone needs such a program, let that fact be known and I
   bet there are subscribers to this list who can tell us where to find them.

My msort program, previously mentioned on this list, does exactly this.
(Info at: http://www.cis.upenn.edu/~wjposer/software.html#msort).
Records can be single lines, double newline separated blocks, or
blocks of text terminated by a user-specified separator charactor.
By default fields are separated by whitespace except if the record
type is double-newline separated block, in which case a field defaults
to a line. You can specify a different field separator.

For each sort key you can specify a different field to sort on.
Key fields can be identified positionally, counting from the beginning
of the record or the end, or by their tag. Tags are matched to
regular expressions, not necessarily fixed strings. This provides some
flexibility (e.g. it's okay if you've been inconsistent and used,
say, both "DEF" and "def") and also supports tricks like absorbing
into the tag variable amounts of whitespace between the tag proper
and the field content. This allows Shoebox-style databases to be sorted
however you want. In my own work, I not only sort on headwords and
inverse headwords (for English to X), but on things like stems and
semantic fields. I also do subsorts on things like part of speech.
It is occasionally useful to sort on other things, e.g. page numbers
or section numbers of a written source from which you have entered
material or informant names.

My experience is that once you have a really powerful sorting
program, you discover that you can do all sorts of things with it.

Be warned, though, if this interests you, that at present msort doesn't
run under MS Windows or DOS.

Bill

--
Bill Poser, Linguistics, University of Pennsylvania
http://www.ling.upenn.edu/~wjposer/ billposer at alum.mit.edu


------------------------ Yahoo! Groups Sponsor ---------------------~-->
Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark
Printer at MyInks.com. Free s/h on orders $50 or more to the US & Canada.
http://www.c1tracking.com/l.asp?cid=5511
http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/HKE4lB/TM
---------------------------------------------------------------------~->


Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/lexicographylist/

<*> To unsubscribe from this group, send an email to:
     lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/



More information about the Lexicography mailing list