[Lexicog] Re: Kirrkirr and Shoebox/Toolbox

Fri Aug 13 13:45:19 UTC 2004

Kenneth Keyes wrote:

> Thanks, Mike, for setting me straight.

Not sure that I'm straight, those were just my preliminary impressions...

> ...I would propose that an XML standard for
> lexicography and linguistic analysis include all the grammatical categories
> possible, such as are found in Thomas Payne's Describing Morphosyntax: A
> Guide for Field Linguists, for example.

The FieldWorks group was intending to incorporate the categories in a
number of works (such as the "Textbooks in Cambridge" series--I'm
thinking particularly of the books with titles like "Tense", "Aspect",
"Number", "Person" and "Case").

Some of this work is being done by the ontology group at U of Arizona,
as the "Gold ontology".  I don't keep up with the details, I'm afraid,
but the ontology is available for inspection at http://emeld.org/gold.
They don't seem to have a style sheet, so it's raw XML.  But if you
search for the string "*Begin Tense", you'll come to the section on the
various kinds of grammatical tense, and you can kind of read it without
too much pain :-).  There are of course entries for other categories
besides tense, like Aspect, Modality etc. etc.  I'm not sure where the
categories came from--some of the input was from the LinguaLinks help
files, but I'm pretty sure they've incorporated other stuff.

> I do agree that a relational database delivers faster performance than an
> XML file. However, I have been very impressed with the work of Libronix
> software in the publication of Brill's Hebrew and Aramaic Lexicon of the Old
> Testament, and other lexicons, which is based on XML. It is slow, but
> powerful.

Yes, I keep asking whether the technology isn't getting good enough that
one could just store the entire db in an XML file, and do away with the
relational DB (which is, in some ways, a poor fit).  The answer, last
time I checked, was no--the performance still isn't there.  My _guess_
would be that the FW model, incorporating as it does not only a lexicon
but also interlinear text, morphology, and (some) phonology is
considerably more complex than Brill's lexicon.  But since I'm not
familiar with Brill's lexicon, that could be way off...

> Speaking of interlinearization of texts, by the way, Lars Huttar wrote an
> excellent thesis and data model based on XML and XLT for discourse analysis
> of texts using Levinsohn-Longacre charting. I don't have the reference.

http://www.huttar.net/lars-kathy/thesis/thesis.pdf

Gary Simons was the chair of his thesis committee, so I imagine the work
will be incorporated (sooner or later...) into FieldWorks.

--
	Mike Maxwell
	Linguistic Data Consortium
	maxwell at ldc.upenn.edu

------------------------ Yahoo! Groups Sponsor --------------------~-->
Yahoo! Domains - Claim yours for only $14.70
http://us.click.yahoo.com/Z1wmxD/DREIAA/yQLSAA/HKE4lB/TM
--------------------------------------------------------------------~->

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/lexicographylist/

<*> To unsubscribe from this group, send an email to:
    lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/