[Corpora-List] Chinese and English POS

Bruce Anderson bruce306 at rogers.com
Tue Nov 3 15:53:19 UTC 2009


It is important to recall that the reason POS is important is because it helps decode the semantics of the utterance.

In a highly inflected language, the POS information is (usually) all we need to disambiguate any grammatical ambiguity inherent in the utterance.

In a less-than-highly inflected language, other strategies are needed - e.g. the connections that someone mentioned earlier in this thread.  The reason that this issue has generated a lot of discussion is because English (and Chinese!) are both "less-than-highly inflected languages", and in those languages, POS information by itself is not enough to disambiguate.

I don't look at the challenge as _imposing_ rules but rather one of _discovering_ what rules actually work.  Those discovered rules are likely to be similar to the actual rules used by actual speakers to decode utterances.

Bruce Anderson




________________________________
From: Geoffrey Sampson <grs2 at sussex.ac.uk>
To: Xing Fukun <xingfukun001 at gmail.com>
Cc: CORPORA at uib.no; simon smith <smithsgj at nccu.edu.tw>
Sent: Tue, November 3, 2009 9:00:48 AM
Subject: Re: [Corpora-List] Chinese and English POS

I don't believe it makes sense to look for a theory telling us what PoS a
given word in a given context "really" is, for numerous examples such as
those mentioned by Adam Kilgarriff in this thread.  I just don't see that
there is a "truth of the matter" to which a theory may correspond or fail
to correspond.  What one can do is to try to _impose_ rules that succeed in
determining a unique PoS tagging in as many debatable cases as possible
(and that are consistent with the linguistic consensus in clear cases);
that's what I tried to do for English in my book "English for the
Computer".  But I was always clear that I wasn't claiming to discover facts
about English structure, only imposing a classification scheme on English. 
(It seemed to me that some of the contributors to this thread were not
recognising this distinction, though others perhaps do.)

Geoffrey Sampson

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20091103/3e9b001b/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list