[Corpora-List] ?spam? Time costs between manual pos tagging of English and Chinese corpus

Oliver Mason O.Mason at bham.ac.uk
Wed Nov 17 16:54:49 UTC 2010


This of course would support the argument that there are no word
classes, or at least that they are not useful for Chinese. Years ago
John Sinclair mentioned the notion of a 'norb', a word with an
unresolved noun/verb ambiguity, while we were working on a
'context-free' tagger. Maybe you need a tag in your case that can
cover both 'reform' as the nominal concept and 'to reform' as the
verbalisation of that concept. Something along the lines of "process
word", which could be translated into English either as a verb or a
noun.

Oliver


Disclaimer: I know nothing about Chinese. Well, practically nothing.



2010/11/17 Xing Fukun <xingfukun001 at gmail.com>:
> Dear all,
>
> Have anybody made a comparison between the time costs of the manual
> pos tagging of English and Chinese corpus.
>
> I haven’t made any such comparisons but I wander that there are maybe some
> differences. The possible reason is that there are more context clues
> (especially the formal or syntactic clues) for English to determine the pos
> than that in Chinese. For there are less formal or syntactic clues in
> Chinese to determine the pos, person has to rely on the semantic clues to
> determine the pos. But sometimes the semantic clues are not clear enough to
> rely on. For example, “改革很重要” (Reform is very important || To reform is very
> important). In Chinese verb and noun both can possess the position of
> subject and so there is no formal clue to determine the pos of “改革(reform)”.
> If we rely on semantics to determine the pos 改革, it is also difficult .
> “改革”(reform) can be interpreted as object or action in this context. So it
> is difficult to tag pos of the word. But in English it is different. If
> “reform” is subject without “to” it is a noun. If it is a subject with “to”
> it is a verb. There are enough formal clues to determine the pos of reform.
> In this sense I think it is easier for English to tag pos on the raw text
> and maybe more difficult for Chinese to tag pos. And maybe the time cost of
> Chinese corpus construction is more than English. This is just my guess
> without any experiment or investigation. If you know any more I would like
> to know that.
>
> Thank you in advance.
>
>
>
>
> ________________________________
> Xing Fukun
> 2010-11-17
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>



-- 
Dr Oliver Mason
Technical Director of the Centre for Corpus Research
Head of Postgraduate Studies (Doctoral Research)
School of English, Drama, and ACS
The University of Birmingham
Birmingham B15 2TT

To arrange a meeting time see http://meetwith.me/ojmason

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list