Corpora: Question about a Brown Corpus tag

Frank Henrik Mueller fhm at sfs.nphil.uni-tuebingen.de
Thu Sep 14 10:57:40 UTC 2000


Hello all!

> on 17 Aug 2000 Eric S Atwell wrote:
>
> > Some tag definitions in Brown were clearly
> > decided by what TAGGIT found computable;
> > I *guess* linguistic inconsistencies in tagging
> > some words may be down to drawing boundaries on
> > grounds of computational tractability rather than
> > purely linguistic reasons
>
> on 17 Aug 2000 Andrew Harley wrote:
>
> > This explains how so many taggers can claim 95% or higher success rates!
>
> > I also know taggers that tagged IN as "preposition
> > or conjunction" on the same grounds.
> ------------------------

This is a reasonable decision, because you cannot resolve this ambiguity
on the grounds of the immediate context (which most taggers use). It is,
thus, better to keep the POS-information underspecified and resolve the
ambiguity, when you are doing the parse. Otherwise, your parser has to
work with unreliable information.

> So what could be the linguistic reasons that Eric was mentioning? For me
> (with a rather limited linguistic background) the "traditional" criteria
> for POS determination look quite arbitrary or let's say heuristic.
>
> I cannot, for instance, see any advantage of separating "until" in:
> * until tomorrow (preposition)
> * until the morning comes (subordinating conjunction)

I agree that you can (or even should) also leave this underspecified
until you do a full parse. However, at some point you have to make a
decision, because you have to annotate clauses and you have to annotate
prepositional phrases. Now, the 'until' (when it is a connector) gives
you a good cue where the clause starts.

> while not separating "and" in:
> * you and me (coordinating conjunction)
> * I go and see (coordinating conjunction)

As 'and' coordinates constituents of the same kind, you can analyse
sentences like:

'I came and see.' as: [CL [NP [N I]] [VP [V came] [CO and] [V see]]
(my ad-hoc annotation ;-))

The use of 'and' does not affect the 'global' structure of the clause.
However, this is clearly different for 'until' as it introduces a
prepositional phrase in the one case and a clause in the other.

Think of the German 'um' which causes the same problem in sentences
like:

1. Er rannte, [CL um den Bus zu kriegen].
(He ran to catch the bus.)

2. Er rannte [PP um den Bus herum].
(He ran around the bus.)

You can leave the decision open until you do a parse, but you have to
make a decision. Here, you could use a heuristic like: 'If 'um' precedes
a noun phrase, then try to find a matching clause and tag it
'subordinating conjunction', or either (if there is no clause) attach it
to the nounphrase and tag it as a 'preposition'. You can, thus, parse
and specify your tags at the same time.

> Or why should I call the German "entlang" (along) a PREposition,
> even if it is behind the noun phrase:
> * den Fluss entlang (along the river)

In the STTS (Stuttgart-Tuebingen Tag Set) this is called a postposition
(APPO) in contrast to prepositions (APPR).

See for details:

http://www.sfs.nphil.uni-tuebingen.de/Elwis/stts/stts.html

I hope that helps,  Yours FranK Mueller.


--
Frank H. Mueller
Dorfackerstr. 20
72 074 Tuebingen
Tel.: p 07071/980797
      d 07071/29 77 152

fhm at sfs.nphil.uni-tuebingen.de
http://www.sfs.nphil.uni-tuebingen.de/~fhm



More information about the Corpora mailing list