<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#ffffff" text="#000000">
    Yes, it seems like a script detection rather than language
    detection. I also wonder if the notion of Arabic here includes
    languages who use scripts based on this one (say, Urdu, Persian and
    some other).<br>
    <br>
    Taras<br>
    <br>
    <br>
    On 19/06/12 16:39, Hristo Tanev wrote:
    <blockquote
      cite="mid:1340120371.7651.YahooMailNeo@web28902.mail.ir2.yahoo.com"
      type="cite">
      <div style="color: rgb(0, 0, 0); background-color: rgb(255, 255,
        255); font-family: times new roman,new york,times,serif;
        font-size: 12pt;">
        <div>....only that Cyrillic is not a language.</div>
        <div><br>
        </div>
        <div>Hristo Tanev</div>
        <div><br>
        </div>
        <div style="font-size: 12pt; font-family: 'times new roman','new
          york',times,serif;">
          <div style="font-size: 12pt; font-family: 'times new
            roman','new york',times,serif;">
            <div dir="ltr"> <font face="Arial" size="2">
                <hr size="1"> <b><span style="font-weight: bold;">From:</span></b>
                Benjamin Van Durme <a class="moz-txt-link-rfc2396E" href="mailto:vandurme@cs.jhu.edu"><vandurme@cs.jhu.edu></a><br>
                <b><span style="font-weight: bold;">To:</span></b>
                Christine Amling <a class="moz-txt-link-rfc2396E" href="mailto:chamling@students.uni-mainz.de"><chamling@students.uni-mainz.de></a>
                <br>
                <b><span style="font-weight: bold;">Cc:</span></b>
                <a class="moz-txt-link-abbreviated" href="mailto:corpora@uib.no">corpora@uib.no</a> <br>
                <b><span style="font-weight: bold;">Sent:</span></b>
                Tuesday, 19 June 2012, 16:05<br>
                <b><span style="font-weight: bold;">Subject:</span></b>
                Re: [Corpora-List] Need help with Twitter Corpus<br>
              </font> </div>
            <br>
            The following presents a new LID method, and includes a
            comparison<br>
            against a number of tools on Twitter data.<br>
            <br>
            Language Identification for Creating Language-Specific
            Twitter Collections<br>
            Shane Bergsma, Paul McNamee, Mossaab Bagdouri, Clayton Fink,
            Theresa Wilson<br>
            <a moz-do-not-send="true"
              href="http://aclweb.org/anthology-new/W/W12/W12-2108.pdf"
              target="_blank">http://aclweb.org/anthology-new/W/W12/W12-2108.pdf</a><br>
            <br>
            Accuracy numbers (with most other systems run black-box
            without<br>
            adaptation, so take these conservatively) :<br>
            <br>
                            Arabic        Devanagari      Cyrillic<br>
            TextCat          96.3          89.1            90.3<br>
            Google CLD        90.5          NA              91.4<br>
            Lui/Baldwin      91.4          78.4            88.8<br>
            PPM - (new)        97.6          97.1            95.8<br>
            <br>
            _______________________________________________<br>
            UNSUBSCRIBE from this page: <a moz-do-not-send="true"
              href="http://mailman.uib.no/options/corpora"
              target="_blank">http://mailman.uib.no/options/corpora</a><br>
            Corpora mailing list<br>
            <a moz-do-not-send="true" ymailto="mailto:Corpora@uib.no"
              href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>
            <a moz-do-not-send="true"
              href="http://mailman.uib.no/listinfo/corpora"
              target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
            <br>
            <br>
          </div>
        </div>
      </div>
      <pre wrap="">
<fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
UNSUBSCRIBE from this page: <a class="moz-txt-link-freetext" href="http://mailman.uib.no/options/corpora">http://mailman.uib.no/options/corpora</a>
Corpora mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Corpora@uib.no">Corpora@uib.no</a>
<a class="moz-txt-link-freetext" href="http://mailman.uib.no/listinfo/corpora">http://mailman.uib.no/listinfo/corpora</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>