[Corpora-List] RandLM

Miles Osborne miles at inf.ed.ac.uk
Tue Nov 4 08:26:30 UTC 2008


What is it?

RandLM ("randomised language modelling") is yet another language model
for the open source translation system Moses.  However, it is designed
to be very space-efficient indeed:
depending upon settings, it can represent an SRILM language model in
about 1/10 of the space. The code can be used to estimate LMs either
from raw text (similar to SRILM's "ngram-count") or else can be used
to load pre-built ARPA files.  Best compression results are obtained
when building LMs from raw text.

You can get the code here:

http://sourceforge.net/projects/randlm

(This is the first public release and there are sure to be bugs)

Read the files:

BUILDING_WITH_MOSES.txt

for Moses integration and:

README

for general information on building the release.

Note that Moses can support SRILM and RandLM LMs at the same time --just use

/configure --with-randlm=/path/to/randlm --with-randlm=/path/to/srilm

If you want to read more about this, then look at our ACL and EMNLP papers:

David Talbot and Miles Osborne.  Smoothed Bloom filter language
models: Tera-Scale LMs on the Cheap. EMNLP, Prague, Czech Republic
2007.
http://www.iccs.informatics.ed.ac.uk/~osborne/papers/emnlp07.pdf

David Talbot and Miles Osborne. Randomised Language Modelling for
Statistical Machine Translation. ACL, Prague, Czech Republic 2007.
http://www.iccs.informatics.ed.ac.uk/~osborne/papers/acl07.pdf

Miles



--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list