<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 12 (filtered medium)">
<style>
<!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;}
@page Section1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.Section1
{page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=EN-US link=blue vlink=purple>
<div class=Section1>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Dear Adam,<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>We did a systematic study on the impact of various variables (the
technical decisions that one has to make when implementing a POS tagger) on POS
tagging accuracy. <o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>The report might provide some more detailed information on
possible error sources, respective loss or gain of accuracy, and addresses
difficulties in doing an error analysis with systematic rigor.<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>URL for the report:
http://reports-archive.adm.cs.cmu.edu/anon/isr2008/CMU-ISR-08-131R.pdf<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Best regards, Jana<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Jana Diesner<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Carnegie Mellon University<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>School of Computer Science<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Center for Computational Analysis of Social and Organizational
Systems<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Web: <a href="http://www.andrew.cmu.edu/user/jdiesner/">http://www.andrew.cmu.edu/user/jdiesner/</a><o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'>
<p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span
style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>
corpora-bounces@uib.no [mailto:corpora-bounces@uib.no] <b>On Behalf Of </b>Adam
Kilgarriff<br>
<b>Sent:</b> Wednesday, February 25, 2009 6:16 AM<br>
<b>To:</b> Corpora List<br>
<b>Cc:</b> Sue Atkins; Valerie GRUNDY; Patrick Hanks<br>
<b>Subject:</b> [Corpora-List] POS-tagger maintenance and improvement<o:p></o:p></span></p>
</div>
<p class=MsoNormal><o:p> </o:p></p>
<div>
<p class=MsoNormal>All,<o:p></o:p></p>
</div>
<div>
<p class=MsoNormal> <o:p></o:p></p>
</div>
<div>
<p class=MsoNormal>My lexicography colleagues and I use POS-tagged
corpora all the time, every day, and very frequently spot systematic
errors. (This is for a range of languages, but particularly
English.) We would dearly like to be in a dialogue with the
developers of the POS-tagger and/or the relevant language models so
the tagger+model could be improved in response to our feedback. (We have
been using standard models rather than training our own.) However
it seems, for the taggers and language models we use (mainly
TreeTagger, also CLAWS) and also for other market leaders, all of which seem to
be from Universities, the developers have little motivation for continuing the
improvement of their tagger, since incremental improvements do not
make for good research papers, so there is nowhere for our feedback to go,
nor any real prospect of these taggers/models improving.<o:p></o:p></p>
</div>
<div>
<p class=MsoNormal> <o:p></o:p></p>
</div>
<div>
<p class=MsoNormal>Am I too pessimistic? Are there ways of improving
language models other than developing bigger and better training corpora - not
an exercise we have the resources to invest in? Are there commercial
taggers I should be considering (as, in the commercial world, there is
motivation for incremental improvements and responding to customer feedback)?<br
clear=all>
<o:p></o:p></p>
</div>
<div>
<p class=MsoNormal>Responses and ideas most welcome<o:p></o:p></p>
</div>
<div>
<p class=MsoNormal> <o:p></o:p></p>
</div>
<div>
<p class=MsoNormal>Adam Kilgarriff<br>
-- <br>
================================================<br>
Adam Kilgarriff
<a
href="http://www.kilgarriff.co.uk">http://www.kilgarriff.co.uk</a>
<br>
Lexical Computing Ltd
<a href="http://www.sketchengine.co.uk">http://www.sketchengine.co.uk</a><br>
Lexicography MasterClass Ltd <a
href="http://www.lexmasterclass.com">http://www.lexmasterclass.com</a><br>
Universities of Leeds and Sussex <a
href="mailto:adam@lexmasterclass.com">adam@lexmasterclass.com</a><br>
================================================<o:p></o:p></p>
</div>
</div>
</body>
</html>