[Corpora-List] A New Turkish Corpus From TS Corpus: TS Corpus Wikipedia -Beta-
Taner Sezer
tanersezerr at gmail.com
Thu Aug 29 21:34:33 UTC 2013
Dear Members,
TS Wikipedia Corpus -Beta- is now available. It is freely online
available.
TS Wikipedia Corpus -Beta- is a PosTagged, morphological tagged Turkish
corpus. The corpus consists of 45,245,304 PosTagged tokens.
TS Wikipedia Corpus -Beta- is the first Turkish corpus based on Turkish
Wikipedia pages.
TS Wikipedia Corpus -Beta- features:
TS Wikipedia Corpus -Beta- is POStagged
TS Wikipedia Corpus -Beta- has Morphologically tagged
TS Wikipedia Corpus -Beta- involves lemma form of the tokens
Key word in context view (KWIC)
Word & Lemma search
Frequency search
Regular expression search
Search with CQP Query
Case sensitive search
Building frequency list
Saving the results in different formats
This version is called beta as the corpus is still under development.
The main version is planned to have the capability of making restricted
queries that are based on WikiPedia categories.
Further information can be found on corpus web page at
http://tscorpus.com and documentation on http://tscorpus.com/
Best Regards
--
TanerSezer
http://tscorpus.com
http://tanersezer.com
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list