[Lexicog] Re: Kirrkirr and Shoebox/Toolbox

Christopher Manning manning at CS.STANFORD.EDU
Fri Aug 13 17:49:29 UTC 2004


On 13 August 2004, Mike Maxwell <maxwell at ldc.upenn.edu> wrote:
 > > I do agree that a relational database delivers faster performance than an
 > > XML file. However, I have been very impressed with the work of Libronix
 > > software in the publication of Brill's Hebrew and Aramaic Lexicon of the Old
 > > Testament, and other lexicons, which is based on XML. It is slow, but
 > > powerful.
 >
 > Yes, I keep asking whether the technology isn't getting good enough that
 > one could just store the entire db in an XML file, and do away with the
 > relational DB (which is, in some ways, a poor fit).  The answer, last
 > time I checked, was no--the performance still isn't there.

This is an area that I wish I knew slightly better than I do, but let me
nevertheless offer a few thoughts.  Using a relational database is a
passable fit if your XML data is "data-centric" XML (something like
customer records written as XML) but if you have
"document-centric" XML (like dictionaries, texts, etc.), it is an
extremely poor fit.  My impression is that most programmers would
disrecommend stuffing your XML into a relational database in these
circumstances.  See for instance:

   http://www.xml.com/pub/a/2001/10/24/follow-yr-nose.html

(it's a little old - 2001 - but whatever).  (While not at all involved
in the projects, to my mind this is part of why FieldWorks has always
seemed an overcomplex beast, whereas Shoebox is simple and clean!)

I really don't think performance can be cited as a reason not to keep
things as XML text in 2004.  E.g., it takes Kirrkirr 6 seconds to query
a 10Mb XML file (I think extremely few fieldwork data sets are larger
than this), running on a less than state of the art computer (1.1GHz
Pentium 3M).  (Kirrkirr mainly works over a text XML file, but
supplements text searching with a few indices.)

I think the real reason to not want to have just a text file is the
traditional database advantages of things like allowing concurrent
updates, doing versioning and logging, powerful general query languages,
etc.

Between these two worlds is the world of "native XML databases", which
includes both commercial products like Tamino:

  http://www2.softwareag.com/Corporate/products/tamino/default.asp

and open source efforts like eXist:

  http://exist.sourceforge.net/

I think that really they might be the right technology in 2004 (though
this is where I wish I knew a bit more than I do...).

Cheers,

Chris.


------------------------ Yahoo! Groups Sponsor --------------------~-->
Make a clean sweep of pop-up ads. Yahoo! Companion Toolbar.
Now with Pop-Up Blocker. Get it for free!
http://us.click.yahoo.com/L5YrjA/eSIIAA/yQLSAA/HKE4lB/TM
--------------------------------------------------------------------~->


Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/lexicographylist/

<*> To unsubscribe from this group, send an email to:
    lexicographylist-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/



More information about the Lexicography mailing list