Corpora: Question about a Brown Corpus tag

E S Atwell eric at comp.leeds.ac.uk
Thu Aug 17 12:31:40 UTC 2000


David,
I'd like to help BUT this is my last day before leaving for conference and
vacation, so dont have time to investigate in detail, but...

I was involved in LOB Corpus tagging projectin 1981-3, we started from
Brown corpus, which had been originally tagged using TAGGIT program and
then manually proofread and corrected.  I don't think we had access to a
proofreader's guide defining the Brown tagset, we just had the "corpus
eviudence" of which tags actually appeared with which words.  Tagging of
WH-tags was not as clear-cut as eg sing v plural nouns, and we decided to
change some boundaries/definitions in the new LOB tagset. Some tag
definitions in Brown were clearly decided by what TAGGIT found computable;
I *guess* linguistic inconsistencies in tagging some words may be down to
drawing boundaries on grounds of computational tractability rather than
purely linguistic reasons (or, to be more fair, when two or more
conflicting linguistic criteria were available (eg form v function),
computational tractability was a deciding factor)

We have tried taking some other text samples (teenager conversations, BBC
radio broadcasts, software manuals), re-tagging these with Brown tagset
(and several othger tagsets as well), and getting these proofread by
experts in the original tagset.  See
http://www.scs.leeds.ac.uk/amalgam/amalgam/corpus/tagged_prf.html
for links to each of the samples tagged in 8 different tagsets.
A description of the Brown tagset, in terms of which tags actually appear
with which words, is given in
http://www.scs.leeds.ac.uk/amalgam/tagsets/brown.html

I note that the Brown corpus training set included:

WPS   WH-pronoun, nominative
that who whoever whosoever what whatsoever

WDT   WH-determiner
which what whatever whichever whichever-the-hell

furthermore, "which" does not appear with any other tag than WDT,
"who" appears with WPS, WPO
"that" appears with WPS, CS, DT

It appears that the designers of the Brown tagset decided not to try to
distinguish between determiner and pronoun functions of "which", I guess
because the type of English constraint grammar rules used in TAGGIT would
not have been able to correctly disambiguate between these 2 tags in
sufficient cases.

So, if you want to be consistent with Brown, you simply tag ALL cases of
"which" as WDT, even when introducing relative clauses.

Eric Atwell.

PS if you want to compare your Brown-tagged corpus with another, feel free
to re-use our multi-tagged corpus!


--
Eric Atwell, Distributed Multimedia Systems MSc Tutor & SOCRATES Tutor
School of Computing, University of Leeds, LEEDS LS2 9JT
TEL: (44)113-2335430  FAX: (44)113-2335468
WWW: http://www.comp.leeds.ac.uk/eric  EMAIL: eric at comp.leeds.ac.uk

On Wed, 16 Aug 2000, David Campbell wrote:

> This is a pretty specific question about POS tagging in the Brown Corpus:
> In the sentences:
>
> Which/WDT child broke the glass?
> I do not know which/WDT way to go.
>
> 'which' is acting as a determiner and takes the 'Wh' determiner tag WDT.
> Simmilarly in the sentences:
>
> That/DT child broke the glass.
> I want to go that/DT way.
>
> the word 'that' is acting as a determiner and tagged DT.  However, both
> 'which' and 'that' along with 'who' commonly introduce relative clauses
> (other words, or no word at all, can do this too, but this occurs less
> frequently) such as in the sentences.
>
> The child who/WPS broke the glass is in the the corner.
> The map that/WPS has the red cover will help.
> The book which/WDT is on the table is mine.
>
> Here's my problem.  'Who' and 'That' are tagged by Brown as 'Wh' pronouns
> (WPS) when introducing relative clauses, but 'which' retains it's
> determiner tag WDT.  I am at a loss as to why.  I've looked at
> documentation for the tag sets but found nothing to explain this.  The
> original Penn Treebank had the 'Wh' determiner tagged WDT for all
> instances of 'which' and 'what' as well as instances of 'that' such as
> above.  But this was changed and now 'that' is tagged as a determiner in
> one sense and a pronoun in another.
>
> Can anyone offer a reasonable explanation for this?  I'm currently tagging
> my own corpus and would like to compare it to some text which has
> previously been marked up with the Brown set.  Therefore, I'd like my
> tagging to be consistant with what's been done previously.  But this case
> really bugs me and I was hoping someone might have some insight on why
> things are tagged the way they are here.
>
> Thanks
> David A. Campbell
>
> To make a prairie
>
> To make a prairie it takes a clover
> and one bee,--
> One clover, and a bee,
> And revery.
> The revery alone will do
> If bees are few.
> EMILY DICKINSON
>
>
>



More information about the Corpora mailing list