Corpora: Question about a Brown Corpus tag

David Campbell campbed at flux.cpmc.columbia.edu
Thu Aug 17 03:28:13 UTC 2000


This is a pretty specific question about POS tagging in the Brown Corpus:
In the sentences:

Which/WDT child broke the glass?
I do not know which/WDT way to go.

'which' is acting as a determiner and takes the 'Wh' determiner tag WDT.
Simmilarly in the sentences:

That/DT child broke the glass.
I want to go that/DT way.

the word 'that' is acting as a determiner and tagged DT.  However, both
'which' and 'that' along with 'who' commonly introduce relative clauses
(other words, or no word at all, can do this too, but this occurs less
frequently) such as in the sentences.

The child who/WPS broke the glass is in the the corner.
The map that/WPS has the red cover will help.
The book which/WDT is on the table is mine.

Here's my problem.  'Who' and 'That' are tagged by Brown as 'Wh' pronouns
(WPS) when introducing relative clauses, but 'which' retains it's
determiner tag WDT.  I am at a loss as to why.  I've looked at
documentation for the tag sets but found nothing to explain this.  The
original Penn Treebank had the 'Wh' determiner tagged WDT for all
instances of 'which' and 'what' as well as instances of 'that' such as
above.  But this was changed and now 'that' is tagged as a determiner in
one sense and a pronoun in another.

Can anyone offer a reasonable explanation for this?  I'm currently tagging
my own corpus and would like to compare it to some text which has
previously been marked up with the Brown set.  Therefore, I'd like my
tagging to be consistant with what's been done previously.  But this case
really bugs me and I was hoping someone might have some insight on why
things are tagged the way they are here.

Thanks
David A. Campbell

To make a prairie

To make a prairie it takes a clover
and one bee,--
One clover, and a bee,
And revery.
The revery alone will do
If bees are few.
EMILY DICKINSON



More information about the Corpora mailing list