Corpora: Using SARA to query other corpora than the BNC

Thomas Kuenneth tommi at linguistik.uni-erlangen.de
Thu Jun 21 09:14:51 UTC 2001


>     Well if you can't get the prophet to the mountain, why not just
> move the mountain to the prophet and reformat the corpora into a nice
> format like sgml using the BNC dtd. In this way we could use them
> with SARA. Reformatting corpora  is what we have to do to use many

The point is that Sara is far from being an ideal corpus query system! But you
are certainly right in saying that corpus data should be distributed in a
standardized format.

> other corpus access programs, so why not for SARA.

Because there are things that might have done better. And although I do not want
to end up in a debate about operating systems - there are other platforms in
widespread use and if I am not mistaken the client software is available for
Windows only (I'd be happy to hear that there is a version that will compile
under HP/UX - and I am not talking about the server).

> benefit sgml/xml-formatted corpora might inspire programmers to write
> "more flexible, more general" software for corpus analysis.

Meta languages are ideal for interchange purposes but I doubt that ANY software
will handle SGML data describing 100 million annotated word forms efficiently.
But that's another story.

Regards
Thomas
---
Thomas Kuenneth M.A.           Universitaet Erlangen-Nuernberg
Institut fuer Germanistik         Abteilung Computerlinguistik
Bismarckstr. 6  *  D-91054 Erlangen  *  Tel.: +49 9131 8529250
http://www.linguistik.uni-erlangen.de/~tommi



More information about the Corpora mailing list