27.4652, FYI: Lexical Datasets for Turkish Available Online

The LINGUIST List via LINGUIST linguist at listserv.linguistlist.org
Tue Nov 15 16:23:07 UTC 2016


LINGUIST List: Vol-27-4652. Tue Nov 15 2016. ISSN: 1069 - 4875.

Subject: 27.4652, FYI: Lexical Datasets for Turkish Available Online

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté,
                                   Michael Czerniakowski)
Homepage: http://linguistlist.org

*****************    LINGUIST List Support    *****************
                       Fund Drive 2016
                   25 years of LINGUIST List!
Please support the LL editors and operation with a donation at:
           http://funddrive.linguistlist.org/donate/

Editor for this issue: Yue Chen <yue at linguistlist.org>
================================================================


Date: Tue, 15 Nov 2016 11:22:42
From: Orhan Bilgin [orhan at zargan.com]
Subject: Lexical Datasets for Turkish Available Online

 An new collection of lexical datasets for Turkish has been made available
under a Creative Commons license at the following URL:

http://st2.zargan.com/duyuru/Zargan_Linguistic_Resources_for_Turkish.html

The downloadable electronic collection includes:

- the corpus frequencies of roots, complex word-forms, suffixes, suffix
sequences, letter n-grams and suffix n-grams
- derivational families
- exhaustive lists of 2-, 3-, 4- and 5-letter nonwords and their orthographic
neighbors
- a GML (Graph Modelling Language) file containing all attested suffix
sequences and their frequencies on a single tree
- an attempt to quantify the properties of:
-- noun stems using nine variables such as corpus frequencies of bare,
inflectional, derivational and compound forms, number of syllables and mean
letter-bigram and letter-trigram frequencies
-- suffix sequences using 18 variables such as total frequency, number and
corpus frequencies of parents, children and siblings on the so-called ''suffix
tree'', suffix-bigram and suffix-trigram frequencies, etc

The datasets have been developed as part of a master's thesis at the Cognitive
Science Program of Bogazici University, Istanbul, Turkey, and are hoped to be
useful for theoretical linguists, computational linguists, corpus linguists,
and psycholinguists who study the description and/or processing of Turkish in
particular and of agglutinating languages in general.

Linguistic Field(s): Language Documentation
                     Morphology
                     Text/Corpus Linguistics

Subject Language(s): Turkish (tur)



------------------------------------------------------------------------------

*****************    LINGUIST List Support    *****************
                       Fund Drive 2016
Please support the LL editors and operation with a donation at:
            http://funddrive.linguistlist.org/donate/

        Thank you very much for your support of LINGUIST!
 


----------------------------------------------------------
LINGUIST List: Vol-27-4652	
----------------------------------------------------------
Visit LL's Multitree project for over 1000 trees dynamically generated
from scholarly hypotheses about language relationships:
          http://multitree.org/








More information about the LINGUIST mailing list