[Corpora-List] Query about the (dual) language of web pages

Mike Maxwell maxwell at umiacs.umd.edu
Tue Oct 9 19:45:56 UTC 2007


P Resnik wrote:
> That's correct, Mike -- unfortunately I didn't have the resources to
>  create an ongoing Web bitext mining operation,

To what extent could such an effort run by itself, once set up?  I.e. is
it a candidate for some cloud computing effort?  Maybe assisted by 
volunteers logging in to verify language ID, and that docs are indeed 
translations of each other.

> ...There are some folks trying to get Web-scale computing off the
> ground for the  language research community (e.g.
> http://wacky.sslmit.unibo.it/doku.php)

I'm pretty sure Marco Baroni of that project is on this mailing
list--Marco, any thoughts?

-- 
	Mike Maxwell
	maxwell at umiacs.umd.edu
	"Theorists...have merely to lock themselves in a room
	with a blackboard and coffee maker to conduct their business."
	--Bruce A. Schumm, Deep Down Things

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list