[Lexicog] preparing Shoebox lexicons for publication/ export
Linguist List mirror
lexicography at LISTSERV.LINGUISTLIST.ORG
Tue Jan 20 19:08:57 UTC 2004
----- Original Message -----
From: "Koontz John E" <john.koontz at colorado.edu>
To: <lexicographylist at yahoogroups.com>
Sent: Friday, January 16, 2004 10:07 AM
Subject: Re: [Lexicog] preparing Shoebox lexicons for publication/ export
> Per Mike Maxwell <maxwell at ldc.upenn.edu>
> > But unless I'm mistaken, it's hard to use 'awk' for sorting if there
> > is hierarchical structure (e.g. senses inside subentries,
> > translations of example sentences under the example sentences).
> > Hence my belief that XML tools would work better. (I'm still in the
> > learning curve on XML.)
>
> Actually, I did the sorting with another tool, not Unix sort, but a
> commericial DOS sorting utility. You can handle the hierarchical
> structure in the sorting key. Each level has to contribute a counter which
> is included in the key. In a simple example, something like
>
> rrraaaxxx
>
> where rrr represents the record number, aaa represents a subrecord, e.g.,
> a definition, and xxx represents the relative place of the current field
> in the desired order of fields.
>
> > I have a printout of a draft dated 1994 by Bob on my desk, entitled
> > "Methods of Language Data Processing", which I stole from Bill Poser.
> > Despite the fact that several bridges have been gone under by lots of
> > water, Hsu's document still seems useful. Did he ever publish it?
>
> I think a manuscript on linguistic data processing has been dusted off and
> is undergoing revision. I haven't been in touch for a year or so.
>
> > > I looked at MDF, but it didn't seem very well suited to Siouan
languages.
> >
> > Could you elaborate on this (probably to this list)?
>
> Well, very succinctly, if you need a field to handle category A
> derivations, where the system can handle all possible such categories A
> across all possible languages, either you need a prescient and impossibly
> long list of fields, A1, A2, ... or you need to be able to name fields
> with something more elaborate than a string of characters. For example, a
> field name like {category, "X"}, where "X" is determined by the
> lexicographer. You can implement a system like this with fields of a form
> like
>
> cat "X" actual-contents
>
> which is flexible, but somewhat awkward.
>
> I suspect a better approach is to go with some sort of schema-based
> approach, in which you define not a fixed set of fields in a fixed
> arrangement, but a system for defining suitable structures. In essence,
> something like a XML system for defining dictionary entries. Of course,
> you'd probably want to use XML itself.
>
> The parts you can standardize are the things that recur, like "form",
> "part of speech" ("character"?), and "gloss," rather than particular
> categories like "augmentative" or "reflexive possesive."
>
> > I guess I would say that archival storage format would be XML;
> > publication format (in the sense of something people would look at,
> > or something an NLP program would use) would be some transform of
> > that.
>
> Sorry, yes, that's a much better way to put it. What the user sees is a
> rendering or transform of the XML. But I meant that this would be
> generated from an XML format, and that this XML format underlying the user
> version would probably typicall not be the XML (or maybe even some SF
> transformation of it) that the lexicographer edits.
>
> > By "polygraphs", you mean character n-grams, right?
>
> Yes. I was generalizing "digraphs."
>
>
>
>
> ------------------------ Yahoo! Groups Sponsor ---------------------~-->
> Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark
> Printer at MyInks.com. Free s/h on orders $50 or more to the US & Canada.
> http://www.c1tracking.com/l.asp?cid=5511
> http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/HKE4lB/TM
> ---------------------------------------------------------------------~->
>
> Yahoo! Groups Links
>
> To visit your group on the web, go to:
> http://groups.yahoo.com/group/lexicographylist/
>
> To unsubscribe from this group, send an email to:
> lexicographylist-unsubscribe at yahoogroups.com
>
> Your use of Yahoo! Groups is subject to:
> http://docs.yahoo.com/info/terms/
>
>
More information about the Lexicography
mailing list