[Corpora-List] High-performance Computing and NLP

P Resnik psresnik at gmail.com
Thu Mar 18 13:40:18 UTC 2010


I agree entirely with Miles, although I think the big win is in the
parallelism, along with moving the data to the computation.   I'd say you
are likely to get more bang for your buck by parallelizing your code for a
cluster than by porting it to a more efficient programming language, all
other things being equal.  It's also worth mentioning that many NLP tasks
are quite amenable to very simple coarse-grained parallelism, not requiring
a lot of fancy algorithmic re-thinking.

This discussion is also a nice opportunity to mention an upcoming book by
Jimmy Lin and Chris Dyer, entitled "Data-Intensive Text Processing with
MapReduce".  It's slated for publication by Morgan & Claypool in mid-2010.

  Philip


On Thu, Mar 18, 2010 at 7:17 AM, Miles Osborne <miles at inf.ed.ac.uk> wrote:

> ok, i will bite.
>
> the key insights behind high-performance computing and (certain kinds)
> of NLP is that you need to move the data to the computation and do it
> over a cliuster of machines.  you also need to write code in languages
> such as C++ or C.
>
> here in Edinburgh for our most demanding jobs, we use Hadoop:
>
> http://hadoop.apache.org/
>
> we are not alone here;  i think you will find that serious groups do
> likewise.
>
> Miles
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100318/9079cea2/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list