[Corpora-List] Google 1T set
Miles Osborne
miles at inf.ed.ac.uk
Fri Oct 26 13:30:55 UTC 2007
As a fun exercise, we are going to encode all of the Google release in a
Bloom Filter and see how that goes. We were about to publish a web
front-end to this, but given the licensing, that doesn't look like a viable
option.
For the interested, we had a pair of papers on this kind of thing at ACL and
EMNLP this year:
*David Talbot; Miles Osborne*
*Randomised Language Modelling for Statistical Machine Translation
*http://acl.ldc.upenn.edu/P/P07/P07-1065.pdf
*David Talbot; Miles Osborne*
*Smoothed Bloom Filter Language Models: Tera-Scale LMs on the Cheap
http://acl.ldc.upenn.edu/D/D07/D07-1049.pdf
*
Miles
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20071026/f7f6e1af/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list