[Corpora-List] A New Turkish Corpus From TS Corpus: TS Corpus Wikipedia -Beta-

Taner Sezer tanersezerr at gmail.com
Thu Aug 29 21:34:33 UTC 2013


Dear Members,
TS Wikipedia Corpus -Beta- is now available. It is freely online
available.
TS Wikipedia Corpus -Beta- is a PosTagged, morphological tagged Turkish
corpus. The corpus consists of 45,245,304 PosTagged tokens. 
TS Wikipedia Corpus -Beta- is the first Turkish corpus based on Turkish
Wikipedia pages.


TS Wikipedia Corpus -Beta- features:
        TS Wikipedia Corpus -Beta- is POStagged
        TS Wikipedia Corpus -Beta- has Morphologically tagged
        TS Wikipedia Corpus -Beta- involves lemma form of the tokens
        Key word in context view (KWIC)
        Word & Lemma search
        Frequency search
        Regular expression search
        Search with CQP Query
        Case sensitive search
        Building frequency list
        Saving the results in different formats

This version is called beta as the corpus is still under development.
The main version is planned to have the capability of making restricted
queries that are based on WikiPedia categories.

Further information can be found on corpus web page at
http://tscorpus.com and documentation on http://tscorpus.com/

Best Regards 
-- 
TanerSezer
http://tscorpus.com
http://tanersezer.com


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list