Corpora: Question about a Brown Corpus tag
David Campbell
campbed at flux.cpmc.columbia.edu
Thu Aug 17 03:28:13 UTC 2000
This is a pretty specific question about POS tagging in the Brown Corpus:
In the sentences:
Which/WDT child broke the glass?
I do not know which/WDT way to go.
'which' is acting as a determiner and takes the 'Wh' determiner tag WDT.
Simmilarly in the sentences:
That/DT child broke the glass.
I want to go that/DT way.
the word 'that' is acting as a determiner and tagged DT. However, both
'which' and 'that' along with 'who' commonly introduce relative clauses
(other words, or no word at all, can do this too, but this occurs less
frequently) such as in the sentences.
The child who/WPS broke the glass is in the the corner.
The map that/WPS has the red cover will help.
The book which/WDT is on the table is mine.
Here's my problem. 'Who' and 'That' are tagged by Brown as 'Wh' pronouns
(WPS) when introducing relative clauses, but 'which' retains it's
determiner tag WDT. I am at a loss as to why. I've looked at
documentation for the tag sets but found nothing to explain this. The
original Penn Treebank had the 'Wh' determiner tagged WDT for all
instances of 'which' and 'what' as well as instances of 'that' such as
above. But this was changed and now 'that' is tagged as a determiner in
one sense and a pronoun in another.
Can anyone offer a reasonable explanation for this? I'm currently tagging
my own corpus and would like to compare it to some text which has
previously been marked up with the Brown set. Therefore, I'd like my
tagging to be consistant with what's been done previously. But this case
really bugs me and I was hoping someone might have some insight on why
things are tagged the way they are here.
Thanks
David A. Campbell
To make a prairie
To make a prairie it takes a clover
and one bee,--
One clover, and a bee,
And revery.
The revery alone will do
If bees are few.
EMILY DICKINSON
More information about the Corpora
mailing list