Corpora: Using SARA to query other corpora than the BNC

Thomas Kuenneth tommi at linguistik.uni-erlangen.de
Wed Jun 20 12:28:51 UTC 2001


in response to a posting from Lou Burnard Wed, 20 Jun 2001:

> I am in the process of writing a brief guide to how this can be done,

I cannot refraim from making some remarks here. Sara undoubtedly is among the 
most frequently used programs in this field (as the BNC plays an important role 
in corpus linguistics).
Nonetheless I doubt that the use of Sara for querying other corpora is 
desirable. It is common sense that the software has been tailored to fit the 
structure of the BNC (or vice versa, which does not really matter here). I am 
not too sure if the software is flexible enough to meet the requirements of many 
other corpora, as we have to keep in mind that corpus data per se has a very 
informal structure: consider presence or absence of POS tags, base forms, ... Or 
meta information such as titles of sample texts, legal information, dates of 
publication, lists of categories the samples belong to, ... 
And we must not forget that (at least) the Sara client has its weaknesses in 
terms of limitations concerning size of query results etc - or being bound to a 
particular hardware platform.
There is in fact a bunch of disadvantages and shortcomings of such proprietary 
systems I could address here (and some of which I am going to address on a 
conference soon), which forces me to claim a more flexible, more general 
approach.

In the future the user should have the possibility to choose which corpus tool 
to use for querying ANY corpus. Some programs are already available (which have 
some limitations, too). Others are being developed right now.

Regards
Thomas Künneth
---
Thomas Kuenneth M.A.           Universitaet Erlangen-Nuernberg
Institut fuer Germanistik         Abteilung Computerlinguistik
Bismarckstr. 6  *  D-91054 Erlangen  *  Tel.: +49 9131 8529250
http://www.linguistik.uni-erlangen.de/~tommi


More information about the Corpora mailing list