Corpora: Using SARA to query other corpora than the BNC
Thomas Kuenneth
tommi at linguistik.uni-erlangen.de
Wed Jun 20 12:28:51 UTC 2001
in response to a posting from Lou Burnard Wed, 20 Jun 2001:
> I am in the process of writing a brief guide to how this can be done,
I cannot refraim from making some remarks here. Sara undoubtedly is among the
most frequently used programs in this field (as the BNC plays an important role
in corpus linguistics).
Nonetheless I doubt that the use of Sara for querying other corpora is
desirable. It is common sense that the software has been tailored to fit the
structure of the BNC (or vice versa, which does not really matter here). I am
not too sure if the software is flexible enough to meet the requirements of many
other corpora, as we have to keep in mind that corpus data per se has a very
informal structure: consider presence or absence of POS tags, base forms, ... Or
meta information such as titles of sample texts, legal information, dates of
publication, lists of categories the samples belong to, ...
And we must not forget that (at least) the Sara client has its weaknesses in
terms of limitations concerning size of query results etc - or being bound to a
particular hardware platform.
There is in fact a bunch of disadvantages and shortcomings of such proprietary
systems I could address here (and some of which I am going to address on a
conference soon), which forces me to claim a more flexible, more general
approach.
In the future the user should have the possibility to choose which corpus tool
to use for querying ANY corpus. Some programs are already available (which have
some limitations, too). Others are being developed right now.
Regards
Thomas Künneth
---
Thomas Kuenneth M.A. Universitaet Erlangen-Nuernberg
Institut fuer Germanistik Abteilung Computerlinguistik
Bismarckstr. 6 * D-91054 Erlangen * Tel.: +49 9131 8529250
http://www.linguistik.uni-erlangen.de/~tommi
More information about the Corpora
mailing list