[Corpora-List] CMU ARK Twitter Part-of-Speech Tagger -- v0.3 released

Brendan O'Connor brenocon at cmu.edu
Fri Sep 21 13:55:56 UTC 2012


We're pleased to announce a new release of the CMU ARK Twitter Part-of-Speech
Tagger, version 0.3.

* The new version is much faster (40x) and more accurate (89.2 -> 92.8) than
  before.

* We also have released new POS-annotated data, including a dataset of one
  tweet for each of 547 days.

* We have made available large-scale word clusters from unlabeled Twitter data
  (217k words, 56m tweets, 847m tokens).

Tools, data, and a new technical report describing the release are available at:
http://www.ark.cs.cmu.edu/TweetNLP/

http://www.ark.cs.cmu.edu/TweetNLP/paths/0100100.html
a
http://www.ark.cs.cmu.edu/TweetNLP/paths/1111100101110.html
http://www.ark.cs.cmu.edu/TweetNLP/paths/111100000011.html ,
Brendan O'Connor

--
PhD Student, Machine Learning Department
School of Computer Science, Carnegie Mellon University
http://brenocon.com

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list