If Java is not required, the famous SRI toolkit is well suited for
this task and further processing (ngram lmestimation, back-off,
interpolation , ... ):
http://www.speech.sri.com/projects/srilm/
Regards,
--
Alexandre Allauzen
Univ Paris XI, LIMSI-CNRS
Tel : 01.69.85.80.64 (80.88)
Bur : 114 LIMSI Bat. 508
allauzen at limsi.fr