[Corpora-List] About Part of Speech in English and Chinese

Eric Atwell csc6ea at leeds.ac.uk
Mon Nov 2 10:03:37 UTC 2009


Xing,

It is certainly ture that humans find some PoS-tag ambiguities hard to 
choose between.  A PoS-tagged coprus should come with a Manual defining
the poS-tag set, and explaining how the proofreaders decided with
problem cases.  ICAME, who run the CORPORA discussion list, host 
Manuals for the range of English corpora they distribute, see
http://khnt.hit.uib.no/icame/manuals/

For example, I am more familiar with the LOB corpus tagging scheme; 
the Manual http://khnt.hit.uib.no/icame/manuals/lobman/INDEX.HTM 
includes a chapter on Problem areas
http://khnt.hit.uib.no/icame/manuals/lobman/LOB7.HTM
for example there is some discussion about tagging of "chief".

However, even trained proofreaders armed with Manuals can disagree: 
the main English tagged corpora do still have some inconsistent
taggings.

Eric Atwell, Leeds University


On Sun, 1 Nov 2009, Fukun Xing wrote:

> 
> Hi everybody,
>    I am puzzled with the part of speech of "chief" in the phrase "the chief
> executive officer". In the Penn Treebank "chief" in the phrase sometimes is tagged
> as "JJ" and sometimes tagged as "NN". Could you tell me how you will tag it and
> why. And is it safe to say that there are some PoS ambiguities, which can not even
> be solved by human, in English. I know that it maybe true in Chinese that
> sometimes it is impossible for human to decide the right pos of some words. For
> example, "一件 包装/v n 精美 的 礼品" (1. a present with wonderful decoration. 2.
> a prsent decorated wonderfully)In this sentence "包装"(decorate/decoration) can
> be tagged as noun or verb, both are right, which cannot affected right
> understanding of the sentence. If there is such thing in English can you give some
> examples?
>  Thanks in advance!
> 
> Xing
> 
> 
>

-- 
Eric Atwell,
  Senior Lecturer, Language research group, School of Computing,
  Faculty of Engineering, UNIVERSITY OF LEEDS, Leeds LS2 9JT, England
  TEL: 0113-3435430  FAX: 0113-3435468  WWW/email: google Eric Atwell
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list