[Corpora-List] Real language detection

Roman Klinger roman.klinger at scai.fraunhofer.de
Tue Jul 17 15:41:02 UTC 2012


Hi Adam,

thanks a lot, this really looks perfect for our use case! Thanks to 
Craig as well!

Best,
  Roman


On 17.07.12 17:25, Adam Kilgarriff wrote:
> Dear Roman,
>
> Jan Pomikalek's thesis is substantially on this topic.  The code,
> justext, is on Google Code; demo http://nlp.fi.muni.cz/projekty/justext/
>
> Adam
>
> On 17 July 2012 15:43, Roman Klinger <roman.klinger at scai.fraunhofer.de
> <mailto:roman.klinger at scai.fraunhofer.de>> wrote:
>
>     Hi,
>
>     we have huge text streams in which parts are not really language but
>     symbols, ids, numbers etc.
>
>     Does anybody of you know an existing (and available) system which
>     can classify between 'garbage' and 'real sentences'?
>
>     Probably this is easily done with a dictionary lookup (eg using
>     Google n-gram), but maybe somebody else did already put more effort in.
>
>     Or do you know any papers in this context?
>
>     Thanks,
>       Roman
>
>
>     --
>     Dr. Roman Klinger
>     Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI)
>     Schloss Birlinghoven
>     D-53754 Sankt Augustin
>     Tel.: +49-2241-14-2360 <tel:%2B49-2241-14-2360>
>     Fax.: +49-2241-14-4-2360 <tel:%2B49-2241-14-4-2360>
>     email: roman.klinger at scai.fraunhofer.__de
>     <mailto:roman.klinger at scai.fraunhofer.de>
>     http://www.scai.fraunhofer.de/__klinger.html
>     <http://www.scai.fraunhofer.de/klinger.html>
>
>
>     _________________________________________________
>     UNSUBSCRIBE from this page: http://mailman.uib.no/options/__corpora
>     <http://mailman.uib.no/options/corpora>
>     Corpora mailing list
>     Corpora at uib.no <mailto:Corpora at uib.no>
>     http://mailman.uib.no/__listinfo/corpora
>     <http://mailman.uib.no/listinfo/corpora>
>
>
>
>
> --
> ========================================
> Adam Kilgarriff <http://www.kilgarriff.co.uk/> adam at lexmasterclass.com
> <mailto:adam at lexmasterclass.com>
> Director Lexical Computing Ltd <http://www.sketchengine.co.uk/>
> Visiting Research Fellow University of Leeds <http://leeds.ac.uk>
> /Corpora for all/ with the Sketch Engine <http://www.sketchengine.co.uk>
> /DANTE: a lexical database for English <http://www.webdante.com>/
> ========================================
>


-- 
Dr. Roman Klinger
Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI)
Schloss Birlinghoven
D-53754 Sankt Augustin
Tel.: +49-2241-14-2360
Fax.: +49-2241-14-4-2360
email: roman.klinger at scai.fraunhofer.de
http://www.scai.fraunhofer.de/klinger.html



_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list