[Lexicog] preparing Shoebox lexicons for publication/ export

List Facilitator lexicography2004 at YAHOO.COM
Fri Jan 16 01:36:25 UTC 2004


----- Original Message -----
From: "Koontz John E" <john.koontz at colorado.edu>
To: <lexicographylist at yahoogroups.com>
Sent: Thursday, January 15, 2004 5:17 PM
Subject: Re: [Lexicog] preparing Shoebox lexicons for publication/ export


> On Tue, 13 Jan 2004, mcswell2001 wrote:
> > Anther issue I'm aware of is the ordering of fields.  While Shoebox
> > allows you to define a correct ordering, it does not enforce it, and
> > one can introduce errors in the order.  These are perhaps best found
> > by exporting to XML, and running an XML validating parser (in
> > conjunction with a DTD) over the exported file.  There remain issues
> > of how to fix the errors (in Shoebox vs. in the XML file).
>
> My involvement with Shoebox has probably been less extensive than yours,
> but one thing I remember doing was passing databases through AWK programs
> to generate keys on each line that would help me sort the fields in each
> record into some canonical order.  Afterwards the sorting keys had to be
> deleted to restore the database to a form that Shoebox was happy with.
>
> A lesson I learned from Bob Hsu long ago is that you can do some very
> powerful things with special sorting "handles," generated from the data -
> things you can't usually do easily by sorting the data uninstrumented with
> these handles.  Handles were his solution to nonstandard collating
> sequences, among other things.
>
> > Another question is how to do cross-language comparison of lexicons,
> > given that the fields different lexicographers use may not
> > correspond.  If everyone used MDF, that wouldn't be a problem, but
> > that's sort of like saying that if everyone used English, there
> > wouldn't be any problems.  (There was some discussion of this in the
> > EMELD '02 workshop.)
>
> I looked at MDF, but it didn't seem very well suited to Siouan languages.
> I think you need something more powerful than a list of atomic fieldnames
> to produce a one-size-fits-all scheme of standard fields.
>
> > I'm thinking of doing a paper for the upcoming LREC workshop on
> > minority language documentation, on the topic of how to prepare
> > Shoebox lexicons for publication or export, with emphasis on fixing
> > problems with (in)consistency.
>
> This sounds like an excellent topic.  My special bugbear in this line,
> right after getting the lexicographers to know and love standardization,
> and preventing them from substituting the "formatted for publication"
> version for the "formatted for data management" version, was
> recognizing that record and field structure for the latter was different
> from the record and field structure for the former.  Of course, this was
> in terms of SIL's SF.  I suppose today the publication format would be
> XML.
>
> Bob Hsu's dicitonary software - ever since Bob Rankin dubbed it Hsubox
> I've been unable to remember the real name - contained special facilities
> for listing selected fields of a record in columns, to facilitate
> editorial work.  For example, you could generate a report with headwords
> and part of speech and (start of) definition and sort this in various ways
> to facilitate eyeballing it for anomalies.
>
> Somewhat along these lines I also used to use an SIL tool that printed a
> census of the field names in an SF database.
>
> In working with the Siouan Archives I set up tools to produce censuses of
> the polygraphs used to encode the characters in a 64 character computer
> character set, and look at sequences of consonants.  This was intended to
> help me locate errors and inconsistencies in the keypunching.
>
>
>
> Yahoo! Groups Links
>
> To visit your group on the web, go to:
>  http://groups.yahoo.com/group/lexicographylist/
>
> To unsubscribe from this group, send an email to:
>  lexicographylist-unsubscribe at yahoogroups.com
>
> Your use of Yahoo! Groups is subject to:
>  http://docs.yahoo.com/info/terms/
>
>



More information about the Lexicography mailing list