[Lexicog] preparing Shoebox lexicons for publication/ export

List Facilitator lexicography2004 at YAHOO.COM
Fri Jan 16 01:30:23 UTC 2004


----- Original Message -----
From: "mcswell2001" <maxwell at ldc.upenn.edu>
To: <lexicographylist at yahoogroups.com>
Sent: Tuesday, January 13, 2004 12:45 PM
Subject: [Lexicog] preparing Shoebox lexicons for publication/ export


> I'm a former SIL member, now working at the Linguistic Data
> Consortium at the University of Pennsylvania.  Over the years, I've
> preferred to work on grammars--but dictionaries keep winding up on my
> desk instead.
>
> There are two projects here at LDC using Shoebox to compile lexicons,
> and as a consultant on these projects I was again reminded of how
> inconsistent users can be.  Some of the inconsistencies could have
> been cleared up by the use of features in Shoebox (v5) such as range
> sets.  But there are other sorts of problems that Shoebox (and so far
> as I know, Toolbox) doesn't help with.  One of these is spell
> correction; so I wrote an import-to-Word macro which, given the
> correspondence between fields and language, automatically assigns a
> language to each region of text, so that Word's spelling correctors
> can be applied.  (I hasten to add that as soon as one has done the
> spelling correction, the file is brought back into Shoebox.  No
> dictionary compilation inside Word!)
>
> Anther issue I'm aware of is the ordering of fields.  While Shoebox
> allows you to define a correct ordering, it does not enforce it, and
> one can introduce errors in the order.  These are perhaps best found
> by exporting to XML, and running an XML validating parser (in
> conjunction with a DTD) over the exported file.  There remain issues
> of how to fix the errors (in Shoebox vs. in the XML file).
>
> Another question is how to do cross-language comparison of lexicons,
> given that the fields different lexicographers use may not
> correspond.  If everyone used MDF, that wouldn't be a problem, but
> that's sort of like saying that if everyone used English, there
> wouldn't be any problems.  (There was some discussion of this in the
> EMELD '02 workshop.)
>
> I'm thinking of doing a paper for the upcoming LREC workshop on
> minority language documentation, on the topic of how to prepare
> Shoebox lexicons for publication or export, with emphasis on fixing
> problems with (in)consistency.  (Before publishing, one might also
> want to look at coverage, agreement between the grammatical
> categories in the lexicon and those in published grammars for the
> language, etc., but this is outside of what I want to cover, as are
> basics of using MDF.) I am aware of work by Ken Zook (of SIL) on
> importing earlier Shoebox lexicons into LinguaLinks, the web pages
> for the KirrKirr project on importing into that tool, and the
> addition of the 'verify interlinear' feature to Toolbox.  Beyond
> this, I haven't seen much.
>
> Has anyone seen other sorts of problems for which additional tools or
> checks are needed in order to ensure consistency (or more generally,
> quality) in Shoebox lexicons?  I'll be happy to cite you, should this
> idea turn into a real (and accepted) paper.
>
>      Mike Maxwell
>
>
>
>
> Yahoo! Groups Links
>
> To visit your group on the web, go to:
>  http://groups.yahoo.com/group/lexicographylist/
>
> To unsubscribe from this group, send an email to:
>  lexicographylist-unsubscribe at yahoogroups.com
>
> Your use of Yahoo! Groups is subject to:
>  http://docs.yahoo.com/info/terms/
>
>



More information about the Lexicography mailing list