[Corpora-List] TASA corpus
P bI K O B___ B.B. (MOCKBA)
rykov at narod.ru
Fri Apr 30 12:23:24 UTC 2004
Dear list members, maybe anybody knows about so called "TASA corpus":
It contains 10 million words of UNMARKED high-school level English text on
Language arts, Health, Home economics, Industrial arts, Science, Social studies, and Business.
Divided into 37,600 text samples, or contexts, or "documents"
(average of 166 words/document).
If the corpus is commercial - then who is owner and the terms of getting it.
The refs I know -
http://www.rni.org/kanerva/cogsci2k-poster.txt
http://lsa.colorado.edu/spaces.html
--
Regards Vladimir Rykov
PhD in Computational Linguistics
Personal web-site: rykov.narod.ru
mailto: rykov2000 at mail.ru
Si etiam omnes - ego non
English version: www.blkbox.com/~gigawatt/rykov.html
--
Яндекс.Почта: объем почтового ящика неограничен! (http://mail.yandex.ru/monitoring/)
More information about the Corpora
mailing list