[Corpora-List] High-performance Computing and NLP

Miles Osborne miles at inf.ed.ac.uk
Thu Mar 18 13:55:49 UTC 2010


let me add to that plug. it is a great book and one i'd highly
recommend to anyone who wants to get to grips with Big Data and the
like.

Miles

On 18 March 2010 13:40, P Resnik <psresnik at gmail.com> wrote:
> I agree entirely with Miles, although I think the big win is in the
> parallelism, along with moving the data to the computation.   I'd say you
> are likely to get more bang for your buck by parallelizing your code for a
> cluster than by porting it to a more efficient programming language, all
> other things being equal.  It's also worth mentioning that many NLP tasks
> are quite amenable to very simple coarse-grained parallelism, not requiring
> a lot of fancy algorithmic re-thinking.
>
> This discussion is also a nice opportunity to mention an upcoming book by
> Jimmy Lin and Chris Dyer, entitled "Data-Intensive Text Processing with
> MapReduce".  It's slated for publication by Morgan & Claypool in mid-2010.
>
>   Philip
>
>
> On Thu, Mar 18, 2010 at 7:17 AM, Miles Osborne <miles at inf.ed.ac.uk> wrote:
>>
>> ok, i will bite.
>>
>> the key insights behind high-performance computing and (certain kinds)
>> of NLP is that you need to move the data to the computation and do it
>> over a cliuster of machines.  you also need to write code in languages
>> such as C++ or C.
>>
>> here in Edinburgh for our most demanding jobs, we use Hadoop:
>>
>> http://hadoop.apache.org/
>>
>> we are not alone here;  i think you will find that serious groups do
>> likewise.
>>
>> Miles
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>> _______________________________________________
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list