Corpora: Using SARA to query other corpora than the BNC
Lou Burnard
lou.burnard at computing-services.oxford.ac.uk
Fri Jun 22 10:39:18 UTC 2001
On Thu, 21 Jun 2001, Thomas Kuenneth wrote:
> The point is that Sara is far from being an ideal corpus query system! But you
> are certainly right in saying that corpus data should be distributed in a
> standardized format.
Good to see that we have some agreement on that at any rate. I don't think
anyone (certainly not me) has ever claimed that SARA was an ideal corpus
query system. I do however claim that it's one of the best currently
around for handling XML encoded corpora of more than trivial size.
If you have specific suggestions about facilities it lacks, or ways it
could be improved I hope you'll share them.
> Because there are things that might have done better. And although I do not want
> to end up in a debate about operating systems - there are other platforms in
> widespread use and if I am not mistaken the client software is available for
> Windows only (I'd be happy to hear that there is a version that will compile
> under HP/UX - and I am not talking about the server).
SARA was designed before Java made cross platform development
(relatively) easy. At that time, common wisdom was that one should develop
platform-specific clients which could interact with platform-independent
servers, and that's the design we followed. The current SARA client can
only run on Windows, but there is no reason why clients should not be
developed for other platforms, as indeed Hans Martin and his colleagues
have impressively demonstrated.
> Meta languages are ideal for interchange purposes but I doubt that ANY software
> will handle SGML data describing 100 million annotated word forms efficiently.
> But that's another story.
On what grounds do you make this assertion? I suppose it all depends what
you mean by "handle efficiently", but it's simply not true that NO
software can handle SGML data on that scale. And what would you advocate
as an alternative?
Lou
More information about the Corpora
mailing list