[Corpora-List] Time costs between manual pos tagging of English and Chinese corpus

Wed Nov 17 16:36:14 UTC 2010

Greetings, Xing Fukun.

I don’t have a direct answer to your question, but perhaps I might offer a
few resources that could allow you to determine the answer.

I’ve been engaged in reducing the costs of corpus annotation using
cost-conscious active learning and multiple annotators.  You can learn more
about what my students, collaborators, and I have been up to on our project
page here:

                https://facwiki.cs.byu.edu/nlp/index.php/Projects:ALFA

As part of this effort, we ran a controlled user study measuring the time
costs of English POS tagging with one form of machine assistance:

                http://www.lrec-conf.org/proceedings/lrec2010/summaries/451.
html

We employed our web-based tool known as CCASH for the user study.  The tool
is described here:

                http://www.lrec-conf.org/proceedings/lrec2010/summaries/360.
html

You could adopt our methodology to study the time cost of annotating
Chinese.  One question to be resolved is which inventory of tags (lexical
grammatical categories) to use for Chinese and whether that inventory is
comparable to tag inventories used in English tagging.

Regards,

--Eric

From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
Xing Fukun
Sent: Wednesday, November 17, 2010 8:23 AM
To: corpora
Subject: [Corpora-List] Time costs between manual pos tagging of English and
Chinese corpus

Dear all,

Have anybody made a comparison between the time costs of the manual pos
tagging of English and Chinese corpus.

I haven’t made any such comparisons but I wander that there are maybe some
differences. The possible reason is that there are more context clues
(especially the formal or syntactic clues) for English to determine the pos
than that in Chinese. For there are less formal or syntactic clues in
Chinese to determine the pos, person has to rely on the semantic clues to
determine the pos. But sometimes the semantic clues are not clear enough to
rely on. For example, “改革很重要” （Reform is very important || To reform
is very important）. In Chinese verb and noun both can possess the position
of subject and so there is no formal clue to determine the pos of “改革
（reform）”. If we rely on semantics to determine the pos 改革, it is also
difficult . “改革”(reform) can be interpreted as object or action in this
context. So it is difficult to tag pos of the word. But in English it is
different. If “reform” is subject without “to” it is a noun. If it is a
subject with “to” it is a verb. There are enough formal clues to determine
the pos of reform. In this sense I think it is easier for English to tag pos
on the raw text and maybe more difficult for Chinese to tag pos. And maybe
the time cost of Chinese corpus construction is more than English. This is
just my guess without any experiment or investigation. If you know any more
I would like to know that.

Thank you in advance.

  _____

Xing Fukun

2010-11-17

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20101117/9c5fdc6e/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora