[Corpora-List] TASA corpus

P bI K O B___ B.B. (MOCKBA) rykov at narod.ru
Fri Apr 30 12:23:24 UTC 2004


   Dear list members, maybe anybody knows about so called "TASA corpus":


   It contains 10 million words of UNMARKED high-school level English text on
   Language arts, Health, Home economics, Industrial arts, Science,    Social studies, and Business.

   Divided into 37,600 text samples, or contexts, or "documents"
   (average of 166 words/document).

   If the corpus is commercial - then who is owner and the terms of getting it.

  The refs I know -

http://www.rni.org/kanerva/cogsci2k-poster.txt
http://lsa.colorado.edu/spaces.html

--
  Regards Vladimir Rykov

PhD in Computational Linguistics
Personal web-site: rykov.narod.ru
mailto: rykov2000 at mail.ru
Si etiam omnes - ego non
English version:   www.blkbox.com/~gigawatt/rykov.html

--
Яндекс.Почта: объем почтового ящика неограничен! (http://mail.yandex.ru/monitoring/)



More information about the Corpora mailing list