[Lexicog] preparing Shoebox lexicons for publication/ export

Tue Jan 20 19:15:56 UTC 2004

----- Original Message -----
From: "Koontz John E" <john.koontz at colorado.edu>
To: <lexicographylist at yahoogroups.com>
Sent: Saturday, January 17, 2004 1:43 AM
Subject: Re: [Lexicog] preparing Shoebox lexicons for publication/ export


> On Fri, 16 Jan 2004, Mike Maxwell wrote:
> > Sorry, I should have been more explicit: unless I misunderstand, this
> > sort method won't work when you have a _sequence_ of hierarchical
> > fields, e.g.
> >    \ex Yax bo'on ta hna'.
> >       \ex_xltn I'm going to my house.
> >    \ex Ya bal shba'at ta bey?
> >       \ex_xltn Are you going on the trail?
> > --because you want to keep the ex_xltn field with the _corresponding
> > ex field.  I think the above sort method would give you the two ex
> > fields, and then the two ex_xltn fields (or vice versa).
>
> It's been a while since I did this, so let's hope I get it right the first
> time!  Consider a book - typically a grammar! - in which the sections are
> numbered hierarchically, e.g., 1, 1.1, 1.1.1, 1.1.2, 1.2, 2, 2.1, etc.
> Such a numbering clearly imposes an ordering on the sections of the book.
> We take the form given to be equivalent to canonical 1.0.0, 1.1.0, 1.1.1,
> etc., and, if we lack a sorting engine that can handle such numbering we
> convert it to the form 100, 110, 111, etc.  In my simple case, each level
> has a single digit, but if we have ten chapters and up to 100 subsections,
> etc., then we have to allow two digits for chapters, 3 for subsections,
> and so on.
>
> Now suppose that the book sections are out of order.  If we can deduce the
> correct section numbers and sort on them, we can get the book back in
> order.  Say the chapters are different languages, and the first section is
> always verbs, the second nouns, etc.
>
> The case of dictionary database in which we don't like the order of the
> fields and some of the fields have a hierarhcically determined order is
> analogous.  The main problem is coming up with a reasonable set of
> heuristics for attaching numbers.
>
> In the example you propose, let r be the index of a record, f the desired
> index of any fields or associated n-tuples of fields within a record, and
> s the desired index of a field within an n-tuple.  It never hurts to have
> an "original order" tie breaker n.  The key is then rfsn.
>
> The value of r is just the index of the records - 001, 002, etc., and
> increments as we reach each new "first field" of a record.  We assume that
> assume "first fields" are always first and God help us if they aren't.
>
> The value of f is, say 1 for the headword, 2 for the definition, and 3 for
> an (ex, ex_xltn) pair.  If we run into a definition within a record, we
> assign it a 2, while an ex or ex_xltn gets a 3.
>
> The value of s is 1 for a headword or definition, and 1 for ex but 2 for
> ex_xltn.
>
> The value of n allows us to make sure that ex and ex_xltn pairs remain in
> their original order, even if they end up being adjusted relative to a
> definition.
>
> It may be that the problem you see is basically what I've referred to as
> coming up with a reasonable set of heuristics.  We have to assume, for
> example, that we know which ex and ex_xltn fields go together.  Typically
> I'd assume that they came in the order ex then ex_xltn and that the only
> problem was that sometimes the definition was after the examples by
> mistake.
>
> The next level of complications comes with an additional level of
> hierarchy.  In this example I've assumed there's only one definition, or,
> if there are multiple ones, that all definitions precede all examples.
> Obviously this is not very likely.  To handle this case we need something
> like r d (for definition) f (field in definition bundle) s n and now in
> setting up the heuristics we have to assume either that the various
> subfields within a definition bundle will all follow the definition field,
> or that there will be something about them that lets us decide which
> definition they go with.
>
> In short, the amount of disorder we can allow for without allowing us to
> hand-assign the keys is somewhat limited.  However, in my experience,
> slight disorders of fields are to be expected, if the entry scheme doesn't
> automate ordering, while major disorders are unlikely, but call for
> careful hand sorting if they do occur.
>
> > OK, if I'm understanding you correctly (please correct me if I'm
> > not!), what you want is a multi-part field.  Something like the
> > following XMLish:
> >
> >     <derivative>
> >         <category>Noun</category>
> >         <form>orientation</form>
> >         <otherStuff>blah blah</otherStuff>
> >     </derivative>
> >
> > Would this work?  As you mention, SFMs are not particularly suited to
> > this, because of the lack of a close marker.  (same situation as with
> > example sentences and their translations)
>
> Yes.
>
> > I may not be understanding here--if we did have a fixed set of fields
> > which allowed for hierarchy of structure (like the hypothetical
> > <derivative> field above), would this work?  Assuming of course that
> > we could provide for every sort of structure you might want, which is
> > of course impossible.  But we should be able to come close, or at
> > least that's my bias.
>
> It seems to me that the things you can most usefully standardize are
> elements of the hierarchy.  The minute you do anything that amounts to an
> assertion that "all languages have at least X" or "all languages only X"
> or "all languages do X like Y" then you are making something that will not
> be useful when you are wrong.  Stick to things that assert this sort of
> thing about dictionaries instead.  You probably can't got far wrong in
> assuming that languages have nouns and verbs and sometimes adjectives, but
> the minute you assume any universal theory of grammar, phonology, or
> orthogrpahy you are indulging in planned obsolesence and, probably,
> inapplicability.
>
> Please allow for comparative dictionaries as well as bilingual and
> monolingual ones!
>
>
>
> Yahoo! Groups Links
>
> To visit your group on the web, go to:
>  http://groups.yahoo.com/group/lexicographylist/
>
> To unsubscribe from this group, send an email to:
>  lexicographylist-unsubscribe at yahoogroups.com
>
> Your use of Yahoo! Groups is subject to:
>  http://docs.yahoo.com/info/terms/
>
>