[Corpora-List] Turkish Corpus - TS Corpus -
Taner Sezer
tanersezerr at gmail.com
Thu Aug 30 14:53:59 UTC 2012
Dear Members,
TS Corpus is a Turkish Corpus project that is freely online available.
TS Corpus is a general-purpose Turkish Corpus containing 491 million
POSTagged tokens. TS Corpus is build and is being kept running by Taner
Sezer. The corpus is based on CWB.
Today the second version of TS Corpus has released.
Corpus can be reached at:
http://tscorpus.com
NTS Corpus serves the following features:
* TS Corpus is POStagged
* TS Corpus has Morphologically annotation
* TS Corpus involves the lemma form of the tokens
* Key word in context view (KWIC)
* Word & Lemma search
* Frequency search
* Regular expression search
* Search with CQP Query
* Case sensitive search
* Building frequency list
* Saving the results in different formats
New Features of the Second Version
* Queries based on Morphological Annotation
* Restricted query
* Simplified POSTag set and disambiguation
* Displaying POSTags on KWIC screen and morphological annotation on
context view
* Distribution of hit sets based on metadata restrictions
* Hits sets are now can be categorised
* Users can create subcorpora
Further information can be found on corpus web page at
http://tscorpus.com and documentation on http://tscorpus.com/wiki
Best Regards
--
TanerSezer
http://tscorpus.com
http://tanersezer.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120830/527f63f4/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list