Corpora: Question about a Brown Corpus tag
Frank Henrik Mueller
fhm at sfs.nphil.uni-tuebingen.de
Thu Sep 14 10:57:40 UTC 2000
Hello all!
> on 17 Aug 2000 Eric S Atwell wrote:
>
> > Some tag definitions in Brown were clearly
> > decided by what TAGGIT found computable;
> > I *guess* linguistic inconsistencies in tagging
> > some words may be down to drawing boundaries on
> > grounds of computational tractability rather than
> > purely linguistic reasons
>
> on 17 Aug 2000 Andrew Harley wrote:
>
> > This explains how so many taggers can claim 95% or higher success rates!
>
> > I also know taggers that tagged IN as "preposition
> > or conjunction" on the same grounds.
> ------------------------
This is a reasonable decision, because you cannot resolve this ambiguity
on the grounds of the immediate context (which most taggers use). It is,
thus, better to keep the POS-information underspecified and resolve the
ambiguity, when you are doing the parse. Otherwise, your parser has to
work with unreliable information.
> So what could be the linguistic reasons that Eric was mentioning? For me
> (with a rather limited linguistic background) the "traditional" criteria
> for POS determination look quite arbitrary or let's say heuristic.
>
> I cannot, for instance, see any advantage of separating "until" in:
> * until tomorrow (preposition)
> * until the morning comes (subordinating conjunction)
I agree that you can (or even should) also leave this underspecified
until you do a full parse. However, at some point you have to make a
decision, because you have to annotate clauses and you have to annotate
prepositional phrases. Now, the 'until' (when it is a connector) gives
you a good cue where the clause starts.
> while not separating "and" in:
> * you and me (coordinating conjunction)
> * I go and see (coordinating conjunction)
As 'and' coordinates constituents of the same kind, you can analyse
sentences like:
'I came and see.' as: [CL [NP [N I]] [VP [V came] [CO and] [V see]]
(my ad-hoc annotation ;-))
The use of 'and' does not affect the 'global' structure of the clause.
However, this is clearly different for 'until' as it introduces a
prepositional phrase in the one case and a clause in the other.
Think of the German 'um' which causes the same problem in sentences
like:
1. Er rannte, [CL um den Bus zu kriegen].
(He ran to catch the bus.)
2. Er rannte [PP um den Bus herum].
(He ran around the bus.)
You can leave the decision open until you do a parse, but you have to
make a decision. Here, you could use a heuristic like: 'If 'um' precedes
a noun phrase, then try to find a matching clause and tag it
'subordinating conjunction', or either (if there is no clause) attach it
to the nounphrase and tag it as a 'preposition'. You can, thus, parse
and specify your tags at the same time.
> Or why should I call the German "entlang" (along) a PREposition,
> even if it is behind the noun phrase:
> * den Fluss entlang (along the river)
In the STTS (Stuttgart-Tuebingen Tag Set) this is called a postposition
(APPO) in contrast to prepositions (APPR).
See for details:
http://www.sfs.nphil.uni-tuebingen.de/Elwis/stts/stts.html
I hope that helps, Yours FranK Mueller.
--
Frank H. Mueller
Dorfackerstr. 20
72 074 Tuebingen
Tel.: p 07071/980797
d 07071/29 77 152
fhm at sfs.nphil.uni-tuebingen.de
http://www.sfs.nphil.uni-tuebingen.de/~fhm
More information about the Corpora
mailing list