[Corpora-List] Corpora for language identification training?
Vlado Keselj
vlado at cs.dal.ca
Thu Apr 19 14:23:59 UTC 2007
Hi,
You can find several links relevant to written language identification at:
http://users.cs.dal.ca/~vlado/nlp/#nlp/tc/langid
Here is the URL list as well:
cat:nlp/tc/langid
name:Language identification tools, by Gertjan van Noord (TextCat)
URL:http://odur.let.rug.nl/~vannoord/TextCat/competitors.html
cat:nlp/tc/langid
name:On-line tool by Steve Huffman
URL:http://complingone.georgetown.edu/~langid/
cat:nlp/tc/langid
URL:http://cslu.cse.ogi.edu/HLTsurvey/ch8node9.html
name:Chapter on Automatic Language Identification
description: in <a href="http://cslu.cse.ogi.edu/HLTsurvey/HLTsurvey.html">
Survey of the State of the Art in Human Language Technology</a> by
several editors
cat:nlp/tc/langid
URL:http://www.faganfinder.com/translate/identify.php
name:A Language identification tool at Fagan finder
cat:nlp/tc/langid
URL:http://www.translation-guide.com/language_identification.htm
name:Another language identification tool
cat:nlp/tc/langid
URL:http://www.xrce.xerox.com/people/beesley/langid.html
name:Language identifier by Ken Beesley
cat:nlp/tc/langid
URL:http://dis.tpd.tno.nl/druid/lid/lid_index.html
name:DRUID, a language identification tool
cat:nlp/tc/langid
URL:http://www.w3.org/TR/2004/REC-xml-20040204/#sec-lang-tag
name:Specifying language excerpts in XML
cat:nlp/tc/langid
URL:http://www-rali.iro.umontreal.ca/ProjetSILC.en.html
name:SILC project at RALI
cat:nlp/tc/langid
URL:http://veristage.com/demo/test3.php
name:Language Identification tool
description: by Veristage; minimum 40 characters
cat:nlp/tc/langid
URL:http://www.sil.org/silewp/2000/001/SILEWP2000-001.html
name:Language identification and IT: Addressing problems of linguistic
diversity on a global scale
description: by Peter Constable and Gary Simons, SIL International;
about language tagging
cat:nlp/tc/langid
URL:http://www.usdoj.gov/crt/cor/Pubs/ISpeakCards.pdf
name:Language identification flashcard
description:by US Dept. of Commerce
cat:nlp/tc/langid
URL:http://www.research.microsoft.com/~joshuago/physicslongcomment.ps
name:Comment by J. Goodman on a Physics paper about Language Trees and
Zipping, which got a lot of press coverage in 2001
cat:nlp/tc/langid
URL:http://www.unhchr.ch/udhr/navigate/alpha.htm
name:Universal Declaration of Human Rights
description:UN, in 363 languages (17 Jun 2004)
--Vlado
On Thu, 19 Apr 2007, Adam Funk wrote:
> [19/04/07 13:35] Dean Jones wrote:
>
> > Sorry, I wasn't clear. Personally I'm interested in language ID for
> > "written" texts - specifically, email, although others on the list may
> > be interested in spoken language ID, so I wouldn't want to discourage
> > responses about that.
>
> Here's a tool you might be interested in:
>
> http://www.let.rug.nl/~vannoord/TextCat/
>
>
> along with a list of others:
>
> http://www.let.rug.nl/~vannoord/TextCat/competitors.html
>
More information about the Corpora
mailing list