[Lexicog] preparing Shoebox lexicons for publication/ export

List Facilitator lexicography2004 at YAHOO.COM
Tue Jan 20 19:14:10 UTC 2004


----- Original Message -----
From: "Mike Maxwell" <maxwell at ldc.upenn.edu>
To: <lexicographylist at yahoogroups.com>
Sent: Friday, January 16, 2004 11:53 AM
Subject: Re: [Lexicog] preparing Shoebox lexicons for publication/ export


> --- In lexicographylist at yahoogroups.com, Koontz John E
> <john.koontz at c...> wrote:
> > You can handle the hierarchical structure in the sorting key.
> > Each level has to contribute a counter which
> > is included in the key.  In a simple example, something like
> >
> > rrraaaxxx
> >
> > where rrr represents the record number, aaa represents a
> > subrecord, e.g., a definition, and xxx represents the
> > relative place of the current field
> > in the desired order of fields.
>
> Sorry, I should have been more explicit: unless I misunderstand, this
> sort method won't work when you have a _sequence_ of hierarchical
> fields, e.g.
>    \ex Yax bo'on ta hna'.
>       \ex_xltn I'm going to my house.
>    \ex Ya bal shba'at ta bey?
>       \ex_xltn Are you going on the trail?
> --because you want to keep the ex_xltn field with the _corresponding
> ex field.  I think the above sort method would give you the two ex
> fields, and then the two ex_xltn fields (or vice versa).
>
> [John:]
> >>> I looked at MDF, but it didn't seem very well suited to
> >>> Siouan languages.
> [Me:]
> >> Could you elaborate on this (probably to this list)?
> [John:]
> > Well, very succinctly, if you need a field to handle category A
> > derivations, where the system can handle all possible such
> > categories A across all possible languages, either you need a
> > prescient and impossibly long list of fields, A1, A2, ... or
> > you need to be able to name fields with something more
> > elaborate than a string of characters.  For example, a
> > field name like {category, "X"}, where "X" is determined by
> > the lexicographer.  You can implement a system like this with
> > fields of a form like
> >
> > cat "X" actual-contents
> >
> > which is flexible, but somewhat awkward.
>
> OK, if I'm understanding you correctly (please correct me if I'm
> not!), what you want is a multi-part field.  Something like the
> following XMLish:
>
>     <derivative>
>         <category>Noun</category>
>         <form>orientation</form>
>         <otherStuff>blah blah</otherStuff>
>     </derivative>
>
> Would this work?  As you mention, SFMs are not particularly suited to
> this, because of the lack of a close marker.  (same situation as with
> example sentences and their translations)
>
> > I suspect a better approach is to go with some sort of schema-based
> > approach, in which you define not a fixed set of fields in a fixed
> > arrangement, but a system for defining suitable structures.
> > In essence, something like a XML system for defining dictionary
> > entries.  Of course, you'd probably want to use XML itself.
>
> I may not be understanding here--if we did have a fixed set of fields
> which allowed for hierarchy of structure (like the hypothetical
> <derivative> field above), would this work?  Assuming of course that
> we could provide for every sort of structure you might want, which is
> of course impossible.  But we should be able to come close, or at
> least that's my bias.
>
> > The parts you can standardize are the things that recur,
> > like "form", "part of speech" ("character"?), and "gloss,"
> > rather than particular categories like "augmentative" or
> > "reflexive possesive."
>
> The former are categories of general linguistic (or lexicographic)
> theory, the latter are language categories (which may of course be
> atoms of some theory, in the sense that they might be postulated as
> universals).  A standard would benefit from both sorts of ontologies.
>
> In fact, there is work on both sorts of standards/ ontologies: the
> general categories are in the model that I've worked on with my SIL
> colleagues, while the language categories are being worked on in the
> work that Terry Langendoen and his group are doing
> (http://saussure.linguistlist.org/cfdocs/emeld/documents/gold_draft4.d
> oc).  (Lots of other people are working on these two things, of
> course!)
>
>
>
>
> Yahoo! Groups Links
>
> To visit your group on the web, go to:
>  http://groups.yahoo.com/group/lexicographylist/
>
> To unsubscribe from this group, send an email to:
>  lexicographylist-unsubscribe at yahoogroups.com
>
> Your use of Yahoo! Groups is subject to:
>  http://docs.yahoo.com/info/terms/
>
>
>
> SMS 8
>
>



More information about the Lexicography mailing list