Corpora: Brill's vs. CLAWS

E S Atwell eric at comp.leeds.ac.uk
Tue Jul 17 11:51:57 UTC 2001


One advantage of CLAWS-tagging is that Lancaster U offers a professional
tagging service so you can "outsource" your tagging, see
http://www.comp.lancs.ac.uk/computing/research/ucrel/claws/tagservice.html
 - you can tag a sample of 300 words free via a web-browser, and if you
like what you see contact Chris Needham, chris at comp.lancs.ac.uk for a
quotation on delivery schedule and cost. Alternatively you can buy a
site-licence to set up and run the tagger yourself for GBP750.

Using Brill's tagger is more like "Do-It-Yourself":
you can download the tagger software, free, from Eric Brill's homepage
http://research.microsoft.com/~brill/
then run it on your own texts yourself. Alternatively, you could try our
free email-server version, just email your text (plain ascii, not
HTML/doc/etc, and not an attachment) to amalgam-tagger at comp.leeds.ac.uk
with Subject: Brown) and it should be returned with the tags supplied by
standard Brill tagger. Either way, there is no equivalent of Chris Needham
to advise and guide you through the process: we do not have a Project
Manager to assist customers of this free service...

One advantage of Brill's tagger is greater flexibility in the tagsets: the
original version comes trained to apply Brown Corpus tagset, but it can be
retrained with another tagged corpus to apply another tagset. You can "do
this yourself" with your own preferred tagged corpus.  You could also try
ICE-GB tagset on your own texts by using amalgam-tagger service, this time
email your text to amalgam-tagger at comp.leeds.ac.uk with Subject: ICE
 or you can try other tagsets by changing the Subject to one or more of
Brown ICE LLC LOB Parts POW SEC UPenn
 - see http://www.comp.leeds.ac.uk/amalgam/amalgam/amalgtag3.html

BNC is NOT one of the tagsets we offer, unfortunately since this is a
strong candidate for de-facto standard for (British) English corpus-based
research, not only in Corpus Linguistics but also Natural Language
Processing.  Significantly, BNC C5 and C7 tagsets are included in
Jurafsky and Martin, "Speech and Language Processing", Prentice-Hall 2000
- the standard textbook for NLP final-year-undergrad/Masters-level NLP,
see http://www.cs.colorado.edu/~martin/slp.html
 so you should find it easier to recruit researchers with knowledge of BNC.


Eric Atwell

--
Eric Atwell, Distributed Multimedia Systems MSc Tutor & SOCRATES Tutor
School of Computing, University of Leeds, LEEDS LS2 9JT
TEL: 0113-2335430  MOBILE: 0775-1039104 FAX: 0113-2335468
WWW: http://www.comp.leeds.ac.uk/eric  EMAIL: eric at comp.leeds.ac.uk

On Mon, 16 Jul 2001, Veronika Koller wrote:

> Dear list members,
> without wanting to trigger a bi-partisan discussion, I would still like to
> inquire about the advantages of Brill's tagger over CLAWS tagging service
> or vice versa. The situation at our department leading to this question is
> the following:
> So far, we have only worked with Cobuild's Bank of English and
> self-compiled corpora, using WordSmith Tools as a concordance program for
> the latter. Currently, however, we are planning to obtain several other
> corpora such as the BNC (incl. SARA), the Wolverhampton Corpus of Business
> English (by the way: what kind of concordance program would work best with
> that?) and ICE-GB (incl. ICE-CUP). We have had texts tagged by CLAWS and
> the result proved to be quite useful for our purposes. Since we would very
> much like to streamline our software resources as much as possible (which
> doesn't seem to be much anyway), we'd rather know about the respective
> (dis)advantages in advance. A helpful starting point might e.g. be if
> someone could provide a sample text tagged with the help of Brill's.
>
> Your help will be much appreciated and a summary will be posted.
>
> Regards,
> Veronika Koller
> Mag.a Veronika Koller
> Department of English/Business English
> Vienna University of Economics and Business Administration
> Augasse 9
> A-1090 Vienna
> Tel.: 43/1/31336-4068
> Fax: 43/1/31336-747
>
>



More information about the Corpora mailing list