[Corpora-List] Time costs between manual pos tagging of English and Chinese corpus

delmont at unive.it delmont at unive.it
Wed Nov 17 22:24:28 UTC 2010


Dear Xing Fukun,
I think you should look at the work that has already been done on Chinese
for the Stanford parser. They have a corpus tagged a lexicon tagged and a
parser that runs fairly well on Chinese sentences. Stanford parser also
produces tagging and is free.
Rodolfo Delmonte

> Dear all,
> Have anybody made a comparison between the time costs of the manual pos
> tagging of English and Chinese corpus.
> I haven’t made any such comparisons but I wander that there are maybe some
> differences. The possible reason is that there are more context clues
> (especially the formal or syntactic clues) for English to determine the
> pos than that in Chinese. For there are less formal or syntactic clues in
> Chinese to determine the pos, person has to rely on the semantic clues to
> determine the pos. But sometimes the semantic clues are not clear enough
> to rely on. For example, “改革很重要” (Reform is very important || To reform is
> very important). In Chinese verb and noun both can possess the position of
> subject and so there is no formal clue to determine the pos of
> “改革(reform)”. If we rely on semantics to determine the pos 改革, it is also
> difficult . “改革”(reform) can be interpreted as object or action in this
> context. So it is difficult to tag pos of the word. But in English it is
> different. If “reform” is subject without “to” it is a noun. If it is a
> subject with “to” it is a verb. There are enough formal clues to determine
> the pos of reform. In this sense I think it is easier for English to tag
> pos on the raw text and maybe more difficult for Chinese to tag pos. And
> maybe the time cost of Chinese corpus construction is more than English.
> This is just my guess without any experiment or investigation. If you know
> any more I would like to know that.
> Thank you in advance.
>
>
>
>
>
> Xing Fukun
> 2010-11-17
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list