If you want to solve that kind of problems you could easily write a spell-checker corrector using a language model that considers subparts of each word. The pattern "you -> u" will emerge. Alternatively, if you have a constrained vocabulary you could use <a href="http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance">Damerau-Levenshtein distance measure</a> among words.<div>
<br></div><div>Bye,</div><div>Michele Filannino.<br><br><font color="#666666">CDT PhD student in Computer Science<br>Room IT301 - IT Building<br>The University of Manchester<br><a href="mailto:filannim@cs.manchester.ac.uk" target="_blank">filannim@cs.manchester.ac.uk</a></font><br>
<br><div class="gmail_quote">On Wed, May 9, 2012 at 9:04 AM, Renaud Richardet <span dir="ltr"><<a href="mailto:renaud.richardet@epfl.ch" target="_blank">renaud.richardet@epfl.ch</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Dear Imad,<div><br></div><div>You can ask Nicolas Hernandez (see <a href="http://www.mail-archive.com/opennlp-users@incubator.apache.org/msg00564.html" target="_blank">http://www.mail-archive.com/opennlp-users@incubator.apache.org/msg00564.html</a>) for POS taggers in french.</div>
<div><br></div><div>Regarding "compyouter", that might be more difficult to map…</div><div><br></div><div>All the best, Renaud</div><div><br></div><div><br></div><div>-- <br>Renaud Richardet<br>Blue Brain Project PhD candidate<br>
EPFL Station 15<br>CH-1015 Lausanne<br></div><div><br></div><div><br><div class="gmail_quote"><div><div class="h5">On Wed, May 9, 2012 at 4:35 AM, imad eddin Jerbi <span dir="ltr"><<a href="mailto:jerbi.imad.eddin@gmail.com" target="_blank">jerbi.imad.eddin@gmail.com</a>></span> wrote:<br>
</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5"><div><div><b>Dear Corpora Subscribers,</b></div><div><br></div><div>My name is Imad Eddin Jerbi, doing my master's thesis at Faculty of Economics and Management of Sfax, Tunisia. </div>
<div>I am working on construction and morphosyntactic annotation of a Tunisian dialect corpus.</div>
<div>I need a free and open source (JAVA) part of speech tagging system for French and English. </div><div>This system has to do a linguistic correction first, because the input could be an incorrect word. </div><div><i>Example:</i></div>
<div>Arabic Dialect: “ßóãúÈúíõæÊóÑú “this word is original English language, I converted to Latin characters using SAMPA for Arabic: “compyouter”</div><div>So, the system have to correct the input word “compyouter” to computer, and then give us at the output the possible morphosyntactic annotation.</div>
<div>I would be very grateful if you could give me a names list of the best available systems.</div><div>Thank you in advance.</div><div>Email: <a href="mailto:jerbi.imad.eddin@gmail.com" target="_blank">jerbi.imad.eddin@gmail.com</a></div>
<div><br></div><div><b>Best regards, </b></div><div><br></div>-- <br><div dir="ltr"><p><span>Imad Eddin JERBI</span></p><p><font color="#000000">Student at </font><span>Faculty of Economics and Management of Sfax</span></p>
<p><a href="http://www.fsegs.rnu.tn/" target="_blank"><font color="#3366ff">http://www.fsegs.rnu.tn/</font></a></p><p style="text-align:left"></p><div><font color="#000000"></font> </div><div><font color="#000000">ANLP Research Group</font></div>
<div><a href="http://sites.google.com/site/anlprg" style="color:rgb(17,85,204)" target="_blank"><font color="#3366ff">http://sites.google.com/site/anlprg</font></a></div><div><font color="#000000"></font> </div><div><font color="#000000">MIRACL Laboratory</font></div>
<div><a href="http://www.miracl.rnu.tn/" style="color:rgb(17,85,204)" target="_blank"><font color="#3366ff">www.miracl.rnu.tn</font></a></div><div><font color="#000000"><font></font><br></font> </div><div><font color="#000000">Page Web: </font><a href="https://sites.google.com/site/jerbiimadeddinanlp/" style="color:rgb(17,85,204)" target="_blank"><font color="#3366ff">https://sites.google.com/site/jerbiimadeddinanlp/</font></a></div>
<div><font color="#000000">Email: </font><a href="mailto:jerbi.imad.eddin@gmail.com" style="color:rgb(17,85,204)" target="_blank"><font color="#3366ff">jerbi.imad.eddin@gmail.com</font></a><br></div><div><font color="#000000">Adress: El Wahheb, Chebba : 5170 - Mahdia - TUNISIE.</font></div>
<div><font color="#000000">Gsm: <a href="tel:%2B216%2055688555" value="+21655688555" target="_blank">+216 55688555</a></font></div><p></p></div><br>
</div>
<br></div></div>_______________________________________________<br>
UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>
</div>
<br>_______________________________________________<br>
UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br><br></blockquote></div>
</div>