<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=gb2312">
<META content="MSHTML 6.00.2900.3698" name=GENERATOR><LINK
href="BLOCKQUOTE{margin-Top: 0px; margin-Bottom: 0px; margin-Left: 2em}"
rel=stylesheet></HEAD>
<BODY style="FONT-SIZE: 10pt; FONT-FAMILY: verdana">
<DIV><FONT size=2>
<P class=MsoNormal
style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-pagination: widow-orphan"
align=left><SPAN lang=EN-US
style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">Dear
all,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office"
/><o:p></o:p></SPAN></P>
<P class=MsoNormal
style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-pagination: widow-orphan"
align=left><SPAN lang=EN-US
style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">Have
anybody made a comparison between the time costs of the manual
pos tagging of English and Chinese corpus.
<o:p></o:p></SPAN></P>
<P class=MsoNormal
style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-pagination: widow-orphan"
align=left><SPAN lang=EN-US
style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">I
haven’t made any such comparisons but I wander that there are maybe some
differences. The possible reason is that there are more context clues
(especially the formal or syntactic clues) for English to determine the pos than
that in Chinese. For there are less formal or syntactic clues in Chinese to
determine the pos, person has to rely on the semantic clues to determine the
pos. But sometimes the semantic clues are not clear enough to rely on. For
example, “</SPAN><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: 宋体; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt; mso-ascii-font-family: Verdana; mso-hansi-font-family: Verdana">改革很重要</SPAN><SPAN
lang=EN-US
style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">”
</SPAN><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: 宋体; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt; mso-ascii-font-family: Verdana; mso-hansi-font-family: Verdana">(</SPAN><SPAN
lang=EN-US
style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">Reform
is very important || To reform is very important</SPAN><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: 宋体; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt; mso-ascii-font-family: Verdana; mso-hansi-font-family: Verdana">)</SPAN><SPAN
lang=EN-US
style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">.
In Chinese verb and noun both can possess the position of subject and so there
is no formal clue to determine the pos of “</SPAN><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: 宋体; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt; mso-ascii-font-family: Verdana; mso-hansi-font-family: Verdana">改革(</SPAN><SPAN
lang=EN-US
style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">reform</SPAN><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: 宋体; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt; mso-ascii-font-family: Verdana; mso-hansi-font-family: Verdana">)</SPAN><SPAN
lang=EN-US
style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">”.
If we rely on semantics to determine the pos </SPAN><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: 宋体; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt; mso-ascii-font-family: Verdana; mso-hansi-font-family: Verdana">改革</SPAN><SPAN
lang=EN-US
style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">,
it is also difficult . </SPAN><SPAN
style="FONT-SIZE: 10pt; FONT-FAMILY: 宋体; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt; mso-ascii-font-family: Verdana; mso-hansi-font-family: Verdana">“改革”</SPAN><SPAN
lang=EN-US
style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">(reform)
can be interpreted as object or action in this context. So it is difficult to
tag pos of the word. But in English it is different. If “reform” is subject
without “to” it is a noun. If it is a subject with “to” it is a verb. There are
enough formal clues to determine the pos of reform. In this sense I think it is
easier for English to tag pos on the raw text and maybe more difficult for
Chinese to tag pos. And maybe the time cost of Chinese corpus construction is
more than English. This is just my guess without any experiment or
investigation. If you know any more I would like to know that.
<o:p></o:p></SPAN></P>
<P class=MsoNormal
style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-pagination: widow-orphan"
align=left><SPAN lang=EN-US
style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">Thank
you in advance.<o:p></o:p></SPAN></P>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US><o:p><FONT
face="Times New Roman" size=3> </FONT></o:p></SPAN></P></FONT></DIV>
<DIV><FONT size=2></FONT> </DIV>
<DIV align=left><FONT size=2>
<HR style="WIDTH: 122px; HEIGHT: 2px" SIZE=2>
</FONT></DIV>
<DIV><FONT color=#c0c0c0><FONT size=2><SPAN>Xing Fukun</SPAN></FONT></DIV>
<DIV><FONT size=2>2010-11-17</FONT></FONT></DIV></BODY></HTML>