[Corpora-List] High-performance Computing and NLP

Dominic Widdows widdows at google.com
Thu Mar 18 15:11:30 UTC 2010


OK, I'll bite a little bit as well ...

I can't agree with "move the data to the computation", unless you mean
in general "get the computation and the data together in the same
place". The main thing is that once you are parallelizing /
distributing work that involves tons and tons of data, network
bandwidth becomes a key resource, so reducing transportation needs is
every bit as important as making individual computations fast and
effective.

The astronomical and biomedical communities have been more or less on
top of this for years, there are plenty of stories of researchers
sharing data by mailing CDROMs around the place or even just sending
hard drives around in the mail. But if the algorithm you're coding up
is orders of magnitude smaller than the data it will be run on, you
move the code to the data not the data to the code. This is
increasingly normal, that's why we talk about datacenters not
codecenters.

Best wishes,
Dominic

On Thu, Mar 18, 2010 at 10:52 AM, Alexandre Rafalovitch
<arafalov at gmail.com> wrote:
> Mahout may be of some interest here, though may not be directly relevant:
> http://lucene.apache.org/mahout/
>
> Regards,
>   Alex.
> Personal blog: http://blog.outerthoughts.com/
> Research group: http://www.clt.mq.edu.au/Research/
> - I think age is a very high price to pay for maturity (Tom Stoppard)
>
> On Mon, Mar 15, 2010 at 11:52 AM, Sean Igo <samwibatt at gmail.com> wrote:
>> Good day,
>>
>> My research group is investigating the use of high-performance
>> computing facilities in NLP. By this we mostly mean clustered
>> environments, in which many (usually identical) computers are
>> networked in a single location, and used as a single computing entity
>> through libraries like MPI / OpenMP, MapReduce, etc. and/or using UIMA
>> or other frameworks in environments like that. Grid methods are less
>> of interest to us but I'd also like to hear about them. Pure machine
>> learning research that might be applied to NLP would also be welcome.
>>
>> If you're doing or aware of work like this, please let me know.
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list