[Lexicog] preparing Shoebox lexicons for publication/ export
List Facilitator
lexicography2004 at YAHOO.COM
Fri Jan 16 01:31:52 UTC 2004
----- Original Message -----
From: <martin_hosken at sil.org>
To: <lexicographylist at yahoogroups.com>
Sent: Tuesday, January 13, 2004 8:53 PM
Subject: Re: [Lexicog] preparing Shoebox lexicons for publication/ export
>
>
>
> Dear Ron,
>
> Good to hear that things are moving ahead at LDC.
>
> >There are two projects here at LDC using Shoebox to compile lexicons,
> and as a consultant on these projects I was again reminded of how
> inconsistent users can be. Some of the inconsistencies could have
> been cleared up by the use of features in Shoebox (v5) such as range
> sets. But there are other sorts of problems that Shoebox (and so far
> as I know, Toolbox) doesn't help with. One of these is spell
> correction; so I wrote an import-to-Word macro which, given the
> correspondence between fields and language, automatically assigns a
> language to each region of text, so that Word's spelling correctors
> can be applied. (I hasten to add that as soon as one has done the
> spelling correction, the file is brought back into Shoebox. No
> dictionary compilation inside Word!)
>
> That works. An example of using the resources you have.
>
> >Anther issue I'm aware of is the ordering of fields. While Shoebox
> allows you to define a correct ordering, it does not enforce it, and
> one can introduce errors in the order. These are perhaps best found
> by exporting to XML, and running an XML validating parser (in
> conjunction with a DTD) over the exported file. There remain issues
> of how to fix the errors (in Shoebox vs. in the XML file).
>
> I have a Shoebox to XML conversion program that will probably do what you
> want here. As a quick plug, it has the following features:
>
> 1. Automatically create the DTD from the .typ hierarchy
> 2. Conforms data to the DTD inserting elements if needed.
> 3. Converts data to Unicode (although the mapping process isn't easy to
> manage)
> 4. Handles interlinear text, converting it into XML structure
corresponding
> to the tree (rather than lines of text)
>
> From this, you can then use XSL or the like to restructure your data.
>
> There is no return path XML to SH converter (yet).
>
> >Another question is how to do cross-language comparison of lexicons,
> given that the fields different lexicographers use may not
> correspond. If everyone used MDF, that wouldn't be a problem, but
> that's sort of like saying that if everyone used English, there
> wouldn't be any problems. (There was some discussion of this in the
> EMELD '02 workshop.)
>
> Garbage In Garbage Out. You have to decide where your points of contact
> will be and then write the code to do the merging/comparison. But if you
> don't know what markup the other guy is using, you have garbage.
>
> Next Monday we are having some informal discussions about a working
schema
> for dictionary typesetting in the Southeast asian area. Is there anything
> out there on this apart from the TEI schema. Have you anything we could
> look at from LDC? There is no point in reinventing wheels on this. Does
> anyone else have anything?
>
> GB,
> Martin
>
>
>
>
>
> ------------------------ Yahoo! Groups Sponsor ---------------------~-->
> Buy Ink Cartridges or Refill Kits for your HP, Epson, Canon or Lexmark
> Printer at MyInks.com. Free s/h on orders $50 or more to the US & Canada.
> http://www.c1tracking.com/l.asp?cid=5511
> http://us.click.yahoo.com/mOAaAA/3exGAA/qnsNAA/HKE4lB/TM
> ---------------------------------------------------------------------~->
>
> Yahoo! Groups Links
>
> To visit your group on the web, go to:
> http://groups.yahoo.com/group/lexicographylist/
>
> To unsubscribe from this group, send an email to:
> lexicographylist-unsubscribe at yahoogroups.com
>
> Your use of Yahoo! Groups is subject to:
> http://docs.yahoo.com/info/terms/
>
>
More information about the Lexicography
mailing list