Corpora: Question about a Brown Corpus tag

Mark Lewellen lewellen at
Fri Sep 15 16:21:17 UTC 2000

In response to:
> An alternative to underspecification of POS information is to develop a
> POS tagger that records multiple POS in ambiguous contexts (ideally with
> probabilities attached to each POS choice)....
> Could anyone point out projects that have developed such POS taggers, or
> submit opinions as to their viability?

Miles Osborne wrote:

> Check out:
> from the abstract:
> >
> We consider what tagging models are most appropriate as front ends for
> probabilistic context-free-grammar parsers. In particular we ask if using
> a tagger that returns more than one tag, a ``multiple tagger,'' improves
> parsing performance. Our conclusion is somewhat surprising: single tag
> Markov-model taggers are quite adequate for the task. First of all,
> parsing accuracy, as measured by the correct assignment of parts of speech
> to words, does not increase significantly when parsers select the tags
> themselves. In addition, the work required to parse a sentence goes up
> with increasing tag ambiguity, though not as much as one might expect.
> Thus, for the moment, single taggers are the best taggers.
> >

I downloaded this article, which argues that a parser should _not_ make use
probabilities from a tagger that returns multiple tags with their
This is counter-intuitive to me; however, here is a summary of the argument:
(apologies for generalizing symbols to forms suitable for e-mail)

1) We want to maximize:      p( parse_tree | word_string ).
2) For a context-free grammar, 1) is equivalent to maximizing the product of
     probabilities of the rules used in the parse (i.e., max product
p(rules) ).
3) Since we are maximizing p( parse_tree | word_string ), the rules have
words as
     their terminal symbols, so some of the rules are 'lexical rules'.
4) The probability of a lexical rule  p( tag->word ) is p( word | tag ).
5) The 'multiple' tagger results in p( tag | word ).  This is not the
    p( word | tag ) that we require.  Using p( tag | word ) here is
analagous to
    the problem of using p( tag | word ) instead of p( word | tag ) in some
    HMM taggers.

While I fully understand the logic of this argument, it however is desirable
exploit  the information that a 'multiple' tagger provides.  Perhaps Baye's
could be applied, so that we could use ( p( word) x p( tag | word ) ) / p(
tag )
instead of p( word | tag ).

Are there any agreements/disagreements with the above argument, or any other
comments on the application of 'multiple' PoS taggers as front ends to

Mark Lewellen

More information about the Corpora mailing list