[Corpora-List] CMU ARK Twitter Part-of-Speech Tagger -- v0.3 released
Brendan O'Connor
brenocon at cmu.edu
Fri Sep 21 13:55:56 UTC 2012
We're pleased to announce a new release of the CMU ARK Twitter Part-of-Speech
Tagger, version 0.3.
* The new version is much faster (40x) and more accurate (89.2 -> 92.8) than
before.
* We also have released new POS-annotated data, including a dataset of one
tweet for each of 547 days.
* We have made available large-scale word clusters from unlabeled Twitter data
(217k words, 56m tweets, 847m tokens).
Tools, data, and a new technical report describing the release are available at:
http://www.ark.cs.cmu.edu/TweetNLP/
http://www.ark.cs.cmu.edu/TweetNLP/paths/0100100.html
a
http://www.ark.cs.cmu.edu/TweetNLP/paths/1111100101110.html
http://www.ark.cs.cmu.edu/TweetNLP/paths/111100000011.html ,
Brendan O'Connor
--
PhD Student, Machine Learning Department
School of Computer Science, Carnegie Mellon University
http://brenocon.com
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list