[Corpora-List] Real language detection
Roman Klinger
roman.klinger at scai.fraunhofer.de
Tue Jul 17 15:41:02 UTC 2012
Hi Adam,
thanks a lot, this really looks perfect for our use case! Thanks to
Craig as well!
Best,
Roman
On 17.07.12 17:25, Adam Kilgarriff wrote:
> Dear Roman,
>
> Jan Pomikalek's thesis is substantially on this topic. The code,
> justext, is on Google Code; demo http://nlp.fi.muni.cz/projekty/justext/
>
> Adam
>
> On 17 July 2012 15:43, Roman Klinger <roman.klinger at scai.fraunhofer.de
> <mailto:roman.klinger at scai.fraunhofer.de>> wrote:
>
> Hi,
>
> we have huge text streams in which parts are not really language but
> symbols, ids, numbers etc.
>
> Does anybody of you know an existing (and available) system which
> can classify between 'garbage' and 'real sentences'?
>
> Probably this is easily done with a dictionary lookup (eg using
> Google n-gram), but maybe somebody else did already put more effort in.
>
> Or do you know any papers in this context?
>
> Thanks,
> Roman
>
>
> --
> Dr. Roman Klinger
> Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI)
> Schloss Birlinghoven
> D-53754 Sankt Augustin
> Tel.: +49-2241-14-2360 <tel:%2B49-2241-14-2360>
> Fax.: +49-2241-14-4-2360 <tel:%2B49-2241-14-4-2360>
> email: roman.klinger at scai.fraunhofer.__de
> <mailto:roman.klinger at scai.fraunhofer.de>
> http://www.scai.fraunhofer.de/__klinger.html
> <http://www.scai.fraunhofer.de/klinger.html>
>
>
> _________________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/__corpora
> <http://mailman.uib.no/options/corpora>
> Corpora mailing list
> Corpora at uib.no <mailto:Corpora at uib.no>
> http://mailman.uib.no/__listinfo/corpora
> <http://mailman.uib.no/listinfo/corpora>
>
>
>
>
> --
> ========================================
> Adam Kilgarriff <http://www.kilgarriff.co.uk/> adam at lexmasterclass.com
> <mailto:adam at lexmasterclass.com>
> Director Lexical Computing Ltd <http://www.sketchengine.co.uk/>
> Visiting Research Fellow University of Leeds <http://leeds.ac.uk>
> /Corpora for all/ with the Sketch Engine <http://www.sketchengine.co.uk>
> /DANTE: a lexical database for English <http://www.webdante.com>/
> ========================================
>
--
Dr. Roman Klinger
Fraunhofer-Institute for Algorithms and Scientific Computing (SCAI)
Schloss Birlinghoven
D-53754 Sankt Augustin
Tel.: +49-2241-14-2360
Fax.: +49-2241-14-4-2360
email: roman.klinger at scai.fraunhofer.de
http://www.scai.fraunhofer.de/klinger.html
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list