[Corpora-List] Google 1T set

Miles Osborne miles at inf.ed.ac.uk
Fri Oct 26 13:30:55 UTC 2007


As a fun exercise, we are going to encode all of the Google release in a
Bloom Filter and see how that goes.  We were about to publish a web
front-end to this, but given the licensing, that doesn't look like a viable
option.

For the interested, we had a pair of papers on this kind of thing at ACL and
EMNLP this year:

*David Talbot; Miles Osborne*
*Randomised Language Modelling for Statistical Machine Translation

*http://acl.ldc.upenn.edu/P/P07/P07-1065.pdf

*David Talbot; Miles Osborne*
*Smoothed Bloom Filter Language Models: Tera-Scale LMs on the Cheap

http://acl.ldc.upenn.edu/D/D07/D07-1049.pdf
*
Miles
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20071026/f7f6e1af/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list