[Corpora-List] Turkish Corpus - TS Corpus -

Taner Sezer tanersezerr at gmail.com
Thu Aug 30 14:53:59 UTC 2012


Dear Members,
TS Corpus is a Turkish Corpus project that is freely online available. 
TS Corpus is a general-purpose Turkish Corpus containing 491 million 
POSTagged tokens. TS Corpus is build and is being kept running by Taner 
Sezer. The corpus is based on CWB.
Today the second version of TS Corpus has released.
Corpus can be reached at:
http://tscorpus.com


        NTS Corpus serves the following features:

  * TS Corpus is POStagged
  * TS Corpus has Morphologically annotation
  * TS Corpus involves the lemma form of the tokens
  * Key word in context view (KWIC)
  * Word & Lemma search
  * Frequency search
  * Regular expression search
  * Search with CQP Query
  * Case sensitive search
  * Building frequency list
  * Saving the results in different formats


        New Features of the Second Version

  * Queries based on Morphological Annotation
  * Restricted query
  * Simplified POSTag set and disambiguation
  * Displaying POSTags on KWIC screen and morphological annotation on
    context view
  * Distribution of hit sets based on metadata restrictions
  * Hits sets are now can be categorised
  * Users can create subcorpora

Further information can be found on corpus web page at 
http://tscorpus.com and documentation on http://tscorpus.com/wiki

Best Regards

-- 
TanerSezer
http://tscorpus.com
http://tanersezer.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120830/527f63f4/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list