[Corpora-List] About Part of Speech in English and Chinese
Eric Atwell
csc6ea at leeds.ac.uk
Mon Nov 2 10:03:37 UTC 2009
Xing,
It is certainly ture that humans find some PoS-tag ambiguities hard to
choose between. A PoS-tagged coprus should come with a Manual defining
the poS-tag set, and explaining how the proofreaders decided with
problem cases. ICAME, who run the CORPORA discussion list, host
Manuals for the range of English corpora they distribute, see
http://khnt.hit.uib.no/icame/manuals/
For example, I am more familiar with the LOB corpus tagging scheme;
the Manual http://khnt.hit.uib.no/icame/manuals/lobman/INDEX.HTM
includes a chapter on Problem areas
http://khnt.hit.uib.no/icame/manuals/lobman/LOB7.HTM
for example there is some discussion about tagging of "chief".
However, even trained proofreaders armed with Manuals can disagree:
the main English tagged corpora do still have some inconsistent
taggings.
Eric Atwell, Leeds University
On Sun, 1 Nov 2009, Fukun Xing wrote:
>
> Hi everybody,
> I am puzzled with the part of speech of "chief" in the phrase "the chief
> executive officer". In the Penn Treebank "chief" in the phrase sometimes is tagged
> as "JJ" and sometimes tagged as "NN". Could you tell me how you will tag it and
> why. And is it safe to say that there are some PoS ambiguities, which can not even
> be solved by human, in English. I know that it maybe true in Chinese that
> sometimes it is impossible for human to decide the right pos of some words. For
> example, "一件 包装/v n 精美 的 礼品" (1. a present with wonderful decoration. 2.
> a prsent decorated wonderfully)In this sentence "包装"(decorate/decoration) can
> be tagged as noun or verb, both are right, which cannot affected right
> understanding of the sentence. If there is such thing in English can you give some
> examples?
> Thanks in advance!
>
> Xing
>
>
>
--
Eric Atwell,
Senior Lecturer, Language research group, School of Computing,
Faculty of Engineering, UNIVERSITY OF LEEDS, Leeds LS2 9JT, England
TEL: 0113-3435430 FAX: 0113-3435468 WWW/email: google Eric Atwell
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list