[Lexicog] preparing Shoebox lexicons for publication/ export

List Facilitator lexicography2004 at YAHOO.COM
Fri Jan 16 01:33:13 UTC 2004


----- Original Message -----
From: "Mike Maxwell" <maxwell at ldc.upenn.edu>
To: <lexicographylist at yahoogroups.com>
Sent: Wednesday, January 14, 2004 7:57 AM
Subject: Re: [Lexicog] preparing Shoebox lexicons for publication/ export


> --- In lexicographylist at yahoogroups.com, martin_hosken at s... wrote:
> > Dear Ron,
> > Good to hear that things are moving ahead at LDC.
>
> Actually, it's me (Mike) at LDC.  Ron is somewhere else.
>
> > I have a Shoebox to XML conversion program that will probably
> > do what you want here. As a quick plug, it has the following
> > features:
> >
> > 1. Automatically create the DTD from the .typ hierarchy
> > 2. Conforms data to the DTD inserting elements if needed.
> > 3. Converts data to Unicode (although the mapping process
> >    isn't easy to manage)
> > 4. Handles interlinear text, converting it into XML
> >    structure corresponding to the tree (rather than lines
> >    of text)
>
> Yes, I am interested.  Feature (1) was something I was going to look
> into implementing, sounds like you did it for me :-).  You could
> email it to me (maxwell at ldc.upenn.edu), but it might make sense to
> upload it to the "Files" part of this Yahoo site.
>
> [re cross-language lexicon comparisons:]
> > Garbage In Garbage Out. You have to decide where your points
> > of contact will be and then write the code to do the
> > merging/comparison. But if you don't know what markup
> > the other guy is using, you have garbage.
>
> Exactly, which implies that there should be some sort of standard.
> MDF would be one sort of standard; the TEI schema (which you
> mentioned) would be another, although I believe that is directed more
> towards printed dictionaries.  There's the work that Nancy Ide and
> others have done, which I confess to not knowing much about (see
> http://acl.ldc.upenn.edu/acl2003/lingan/pdf/IdeLenci.pdf for a recent
> paper that touches on it.)  And there's the SIL FieldWorks model
> (post-LinguaLinks) that I reported on at the 2002 EMELD workshop
> (http://saussure.linguistlist.org/cfdocs/emeld/workshop/2002/presentat
> ions/maxwell/Modeling%20Lexical%20Entries%20in%20Bilingual%
> 20Dictionaries.ppt).  The thinking behind the latter was diriven by a
> focus on multilingual (esp. bilingual) lexicons of minority
> languages, although I suspect its application could be broader (but
> then I'm not a Real Lexicographer).
>
> > Next Monday we are having some informal  discussions about
> > a working schema for dictionary typesetting in the Southeast
> > asian area. Is there anything out there on this apart from
> > the TEI schema. Have you anything we could look at from LDC?
>
> There is an obvious need for this sort of thing, and standard
> transforms into other structures (e.g. into lexc format for use with
> Xerox's recently released finite state tools, html format,...)  We
> have nothing at the LDC (most of our lexicons have a trivial
> structure, although the two that I mentioned in my earlier msg do
> not).
>
> The SIL model used to be accessible on the web.  I would expect it to
> be at http://fieldworks.sil.org/ModelDoc/indexLeft.html, but that
> link is broken right now.  I'll try to see what's the matter.
>
>         Mike Maxwell
>
>
>
>
> ------------------------ Yahoo! Groups Sponsor ---------------------~-->
> Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark
> Printer at MyInks.com. Free s/h on orders $50 or more to the US & Canada.
> http://www.c1tracking.com/l.asp?cid=5511
> http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/HKE4lB/TM
> ---------------------------------------------------------------------~->
>
> Yahoo! Groups Links
>
> To visit your group on the web, go to:
>  http://groups.yahoo.com/group/lexicographylist/
>
> To unsubscribe from this group, send an email to:
>  lexicographylist-unsubscribe at yahoogroups.com
>
> Your use of Yahoo! Groups is subject to:
>  http://docs.yahoo.com/info/terms/
>
>



More information about the Lexicography mailing list