<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">


<HTML><HEAD>


<META http-equiv=Content-Type content="text/html; charset=gb2312">


<META content="MSHTML 6.00.2900.3698" name=GENERATOR><LINK 


href="BLOCKQUOTE{margin-Top: 0px; margin-Bottom: 0px; margin-Left: 2em}" 


rel=stylesheet></HEAD>


<BODY style="FONT-SIZE: 10pt; FONT-FAMILY: verdana">


<DIV><FONT size=2>


<P class=MsoNormal 


style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-pagination: widow-orphan" 


align=left><SPAN lang=EN-US 


style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">Dear 


all,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" 


/><o:p></o:p></SPAN></P>


<P class=MsoNormal 


style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-pagination: widow-orphan" 


align=left><SPAN lang=EN-US 


style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">Have 


anybody made a comparison between the time costs of the manual 


pos tagging of English and Chinese corpus.  


<o:p></o:p></SPAN></P>


<P class=MsoNormal 


style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-pagination: widow-orphan" 


align=left><SPAN lang=EN-US 


style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">I 


haven’t made any such comparisons but I wander that there are maybe some 


differences. The possible reason is that there are more context clues 


(especially the formal or syntactic clues) for English to determine the pos than 


that in Chinese. For there are less formal or syntactic clues in Chinese to 


determine the pos, person has to rely on the semantic clues to determine the 


pos. But sometimes the semantic clues are not clear enough to rely on. For 


example, “</SPAN><SPAN 


style="FONT-SIZE: 10pt; FONT-FAMILY: 宋体; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt; mso-ascii-font-family: Verdana; mso-hansi-font-family: Verdana">改革很重要</SPAN><SPAN 


lang=EN-US 


style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">” 


</SPAN><SPAN 


style="FONT-SIZE: 10pt; FONT-FAMILY: 宋体; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt; mso-ascii-font-family: Verdana; mso-hansi-font-family: Verdana">（</SPAN><SPAN 


lang=EN-US 


style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">Reform 


is very important || To reform is very important</SPAN><SPAN 


style="FONT-SIZE: 10pt; FONT-FAMILY: 宋体; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt; mso-ascii-font-family: Verdana; mso-hansi-font-family: Verdana">）</SPAN><SPAN 


lang=EN-US 


style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">. 


In Chinese verb and noun both can possess the position of subject and so there 


is no formal clue to determine the pos of “</SPAN><SPAN 


style="FONT-SIZE: 10pt; FONT-FAMILY: 宋体; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt; mso-ascii-font-family: Verdana; mso-hansi-font-family: Verdana">改革（</SPAN><SPAN 


lang=EN-US 


style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">reform</SPAN><SPAN 


style="FONT-SIZE: 10pt; FONT-FAMILY: 宋体; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt; mso-ascii-font-family: Verdana; mso-hansi-font-family: Verdana">）</SPAN><SPAN 


lang=EN-US 


style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">”. 


If we rely on semantics to determine the pos </SPAN><SPAN 


style="FONT-SIZE: 10pt; FONT-FAMILY: 宋体; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt; mso-ascii-font-family: Verdana; mso-hansi-font-family: Verdana">改革</SPAN><SPAN 


lang=EN-US 


style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">, 


it is also difficult . </SPAN><SPAN 


style="FONT-SIZE: 10pt; FONT-FAMILY: 宋体; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt; mso-ascii-font-family: Verdana; mso-hansi-font-family: Verdana">“改革”</SPAN><SPAN 


lang=EN-US 


style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">(reform) 


can be interpreted as object or action in this context. So it is difficult to 


tag pos of the word. But in English it is different. If “reform” is subject 


without “to” it is a noun. If it is a subject with “to” it is a verb. There are 


enough formal clues to determine the pos of reform. In this sense I think it is 


easier for English to tag pos on the raw text and maybe more difficult for 


Chinese to tag pos. And maybe the time cost of Chinese corpus construction is 


more than English. This is just my guess without any experiment or 


investigation. If you know any more I would like to know that. 


<o:p></o:p></SPAN></P>


<P class=MsoNormal 


style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: left; mso-pagination: widow-orphan" 


align=left><SPAN lang=EN-US 


style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana; mso-bidi-font-family: 宋体; mso-font-kerning: 0pt">Thank 


you in advance.<o:p></o:p></SPAN></P>


<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US><o:p><FONT 


face="Times New Roman" size=3> </FONT></o:p></SPAN></P></FONT></DIV>


<DIV><FONT size=2></FONT> </DIV>


<DIV align=left><FONT size=2>


<HR style="WIDTH: 122px; HEIGHT: 2px" SIZE=2>


</FONT></DIV>


<DIV><FONT color=#c0c0c0><FONT size=2><SPAN>Xing Fukun</SPAN></FONT></DIV>


<DIV><FONT size=2>2010-11-17</FONT></FONT></DIV></BODY></HTML>