[Corpora-List] Corpora for EAP: Architecture...?
Eric Atwell
eric at comp.leeds.ac.uk
Mon Jan 16 10:26:07 UTC 2006
Bootcat and WACKY (Web-as-Corpus Kool Ynitiative) tools are perl scripts
- does anyone know of equivalents in Python? e.g. is anyone developing
web-as-corpus extras for the python Natural Language Tool Kit?
I want to set a Web-as-Corpus data-mining/analysis coursework exercise for
my "Technologies for Knowledge Management" module next semester;
these Computing undergrads are familiar with Python and Java, but not Perl.
Alternatively, when will the public web-based version of BootCat be
available?... (and will it cope with 70 computing students testing it?!)
Eric Atwell, School of Computing, Leeds University
On Mon, 16 Jan 2006, Adam Kilgarriff wrote:
> Dear Nigel,
>
>
>
> Do you know BootCat tools? They allow you to prepare special-language
> corpora from web pages automatically. See
> http://sslmit.unibo.it/~baroni/bootcat.html
>
>
>
> We are currently preparing a web-service version of the tool, so then you
> can enter seed terms and then produce a corpus in that area by clicking
> the go button. Public version to follow before long. In the meantime, if
> you give me half a dozen relevant architecture terms (single words or multi
> words, and selected to avoid picking up non-architecture hits) Ill make a
> small sample corpus and point you to it,
>
>
>
> Adam Kilgarriff
>
> ...
>
> SERGE SHAROFF
> in my view the best option is to collect the corpus you want
> automatically using BootCat tools:
> http://wacky.sslmit.unibo.it/
>
--
Eric Atwell, Senior Lecturer, Language research group, School of Computing,
Faculty of Engineering, University of Leeds, LEEDS LS2 9JT, England
TEL: +44-113-2335430 FAX: +44-113-2335468 http://www.comp.leeds.ac.uk/eric
More information about the Corpora
mailing list