[Corpora-List] Turkic dictionaries

Christian Chiarcos christian.chiarcos at web.de
Wed Jun 25 17:54:04 UTC 2014


Dear all,

I would like to thank everyone who responded to my request and who helped  
me in personal conversation, in particular, Emily Bender, Jost Gippert,  
Max Ionov, Irina Nevskaya, Monika Rind-Pawlowski, Vit Suchomel, Francis  
Tyers, and Mardan Wushouer. Please find a summary, with URLs, brief  
description and licensing information below (no particular order):


(A) Dictionaries/Wordlists in machine-readable formats

(A.1) Gilles Sérasset's DBnary
http://kaiko.getalp.org/about-dbnary/
machine-readable (RDF) dictionaries generated from Wiktionary, incl.  
Turkish
CC-BY-SA

(A.2) Mardan Wushouer's wordlists
http://www.ai.soc.i.kyoto-u.ac.jp/~mardan/resource/Chinese_Uyghur_Bilingual_Dictionary_v1.zip
http://www.ai.soc.i.kyoto-u.ac.jp/~mardan/resource/Chinese_Kazakh_Bilingual_Dictionary_v1.zip
http://www.ai.soc.i.kyoto-u.ac.jp/~mardan/resource/Uyghur_Kazakh_Bilingual_Dictionary_v1.zip
plain word lists for Chinese-Uyghur, Chinese-Kazakh, Uyghur-Kazakh
CC-BY-NC

(A.3) Altaic etymological dictionary
http://starling.rinet.ru/cgi-bin/bdescr.cgi?root=config&morpho=0&basename=\data\alt\turcet
includes 26 Turkic languages, available online and as DBase dump
copyright restricted

(A.4) Freelang
http://freelang.net
English (and partially, French) word lists for 28 Turkic languages (mostly  
small), proprietary list format
freeware (i.e., no modification)

(A.5) Apertium Turkic
http://wiki.apertium.org/wiki/Turkic_languages#Pairs
word lists for Turkic-Azeri, Kazakh-Tatar, 12 more pairs of Turkic  
languages under development
open source (hosted at Sourceforge)

(A.6) RELISH
http://tla.mpi.nl/relish/
lexicons for Chalkan and Tuva, provided by the RELISH project
available online, XML
licensing to be clarified

(A.7) PanLex
http://panlex.org
huge collection of word lists in a unified representation (SQL, RDF)
incl. Azeri, Gagauz, Kazakh, Kirgiz, Turkish, Turkmen, Uzbek, etc.
different (mostly open) licenses depending on the original source

(A.8) Intercontinental Dictionary Series
http://lingweb.eva.mpg.de/ids/, http://datahub.io/de/dataset/ids
word lists of minimal core vocabulary
Azeri, Kumyk, Nogai, Terekeme (Azerbaijan dialect)
plain text or RDF
CC-BY-NC-ND


(B) Human-readable dictionaries/wordlists that can be easily converted  
into machine-readable formats

(B.1) Wiktionary, various languages (see A.1)
http://wiktionary.org
incl. Azeri, Kazakh, Kirgiz, Tatar, Turkish, Turkmen
CC-BY-SA

(B.2) Chalkan dictionary
http://sprachen.sprachsignale.de/tschalkanisch/tschalkanisch.html
German
available for academic use, with attribution, non-commercial

(B.3) Shorica
http://shoriya.ngpi.rdtc.ru/
Shor dictionary and corpus
copyright to be clarified, currently offline (last accessed mid-May 2014)

(B.4) Karachay-Balkar dictionary
http://www.elbrusoid.org/dictionary/
Karachay-Balkar - Russian dictionary
copyright restricted

(B.5) Tatar dictionary
http://tatar.com.ru/dict/dict.php
Tatar-Russian dictionary
copyright restricted

(B.6) Khakassian dictionary
http://khakas.altaica.ru/dictionary/
Khakas - English and Khakas - Russian
copyright restricted



(C) other resources

(C.1) Altaica
http://altaica.narod.ru/e_v-turks.htm
link and resource collection, includes machine-readable and human-readable  
dictionaries for 17 Turkic languages (not replicated above)

(C.2) Pre-Islamic Old Turkic Texts (VATEC)
http://vatec2.fkidg1.uni-frankfurt.de/
glossed corpus (XML) from which a German-Old Turkic word list can be  
compiled
copyright restricted

(C.3) Glosbe
http://glosbe.com
online access to word lists and translation memories
Azeri, Karachay-Balkar, Kazakh, Tatar, Turkish, Turkmen, Uzbek, etc.
free online API (with severe capacity limits)


Certainly, this list is not exhaustive, so if you feel something important  
is missing or incorrect, please let me know ;)

All the best,
Christian


On Thu, 24 Apr 2014 23:51:59 +0200, Christian Chiarcos  
<christian.chiarcos at web.de> wrote:

> Dear all,
>
> as a kind of follow-up question to the one quoted below, is anyone aware  
> of such word lists or machine-readable multilingual dictionaries for  
> other Turkic languages, e.g., Azeri, Kazakh, Kyrgyz, Uzbek, or Uyghur ?
>
> So far, I only found http://www.freelang.net -- an impressive range of  
> languages, but very small word lists only (usually <1000 words).
>
> For our experiment we would need word lists from any Turkic language to  
> Russian, English, German or Turkish.
>
> Thanks a lot,
> Christian
-- 
Christian Chiarcos
Applied Computational Linguistics
Johann Wolfgang Goethe Universität Frankfurt a. M.
60054 Frankfurt am Main, Germany

office: Robert-Mayer-Str. 10, #401b
mail: chiarcos at informatik.uni-frankfurt.de
web: http://acoli.cs.uni-frankfurt.de
tel: +49-(0)69-798-22463
fax: +49-(0)69-798-28931

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list