[Corpora-List] Corpora for EAP: Architecture...?

Eric Atwell eric at comp.leeds.ac.uk
Mon Jan 16 10:26:07 UTC 2006


Bootcat and WACKY (Web-as-Corpus Kool Ynitiative) tools are perl scripts
  - does anyone know of equivalents in Python?  e.g. is anyone developing 
web-as-corpus extras for the python Natural Language Tool Kit?

I want to set a Web-as-Corpus data-mining/analysis coursework exercise for 
my "Technologies for Knowledge Management" module next semester; 
these Computing undergrads are familiar with Python and Java, but not Perl.

Alternatively, when will the public web-based version of BootCat be 
available?... (and will it cope with 70 computing students testing it?!)


Eric Atwell, School of Computing, Leeds University


On Mon, 16 Jan 2006, Adam Kilgarriff wrote:

> Dear Nigel,
>
>
>
> Do you know BootCat tools?  They allow you to prepare special-language
> corpora from web pages automatically.  See
> http://sslmit.unibo.it/~baroni/bootcat.html
>
>
>
> We are currently preparing a web-service version of the tool, so then you
> can enter ‘seed’ terms and then produce a corpus in that area by clicking
> the “go” button.  Public version to follow before long.  In the meantime, if
> you give me half a dozen relevant architecture terms (single words or multi
> words, and selected to avoid picking up non-architecture hits) I’ll make a
> small sample corpus and point you to it,
>
>
>
> Adam Kilgarriff
>
> ...
> 
> SERGE SHAROFF
> in my view the best option is to collect the corpus you want
> automatically using BootCat tools:
> http://wacky.sslmit.unibo.it/
>

-- 
Eric Atwell, Senior Lecturer, Language research group, School of Computing,
Faculty of Engineering, University of Leeds, LEEDS LS2 9JT, England
TEL: +44-113-2335430  FAX: +44-113-2335468  http://www.comp.leeds.ac.uk/eric


More information about the Corpora mailing list