[Corpora-List] word frequencies on the web
radev at umich.edu
radev at umich.edu
Fri Dec 8 16:51:25 UTC 2006
Have you seen this release from Google:
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13
Introduction
This data set, contributed by Google Inc., contains English word
n-grams and their observed frequency counts. The length of the n-grams
ranges from unigrams (single words) to five-grams. We expect this data
will be useful for statistical language modeling, e.g., for machine
translation or speech recognition, as well as for other uses.
Source Data
The n-gram counts were generated from approximately 1 trillion word
tokens of text from publicly accessible Web pages.
>
> Dear all, does anyone know of ways to estimate the frequency of words
> on the web, or if there're search engines that supply this info (as
> Altavista used to do)?
>
> thank you!
> tony
> www2.lael.pucsp.br/~tony
>
>
>
>
--
Dragomir R. Radev Associate Professor
SI, CSE, Ling U. Michigan, Ann Arbor
http://www.eecs.umich.edu/~radev radev at umich.edu
More information about the Corpora
mailing list