[Corpora-List] High-performance Computing and NLP

DJamé Seddah djame.seddah at free.fr
Wed Mar 17 01:56:20 UTC 2010


Hi,
from what I learned from a heavy usage of such facilities offered by www.ichec.ie 
  (already a long time ago), if you want to do let's say parsing of  
very very large data,
the best is to use a cluster with distributed memory and to use a set  
of nodes as slaves to be used in a task farming environment.

Usually such tools are easily written in C using either mpi2ch or  
openmpi.

If you need your programs to access a common memory space distributed  
among different nodes, you need to have
a network with an insane amount of network bandwith otherwise your  
programs will spend their time waiting for data to process.
See the README and the FAQ file of PETSc (http://www.mcs.anl.gov/petsc/petsc-as 
)  about that.


By the way, there's this IBM toolbox (not specially related to nlp but  
with paralell machine learning)
http://www.alphaworks.ibm.com/tech/pml?open&S_TACT=105AGX59&S_CMP=GRsite-lnxw07&ca=dgr-lnxw07awpml
which was referenced on os news :
http://www.osnews.com/story/20631/Parallel_Machine_Learning_Toolbox_for_Linux


I think that we're many to wait for the emergence of a parsing at home  
general grid  framework.

I'd also like to know if the NVIDIA's CUDA compiler is used in the NLP  
community.



Best,
Djamé





Le 15 mars 10 à 16:52, Sean Igo a écrit :

> Good day,
>
> My research group is investigating the use of high-performance
> computing facilities in NLP. By this we mostly mean clustered
> environments, in which many (usually identical) computers are
> networked in a single location, and used as a single computing entity
> through libraries like MPI / OpenMP, MapReduce, etc. and/or using UIMA
> or other frameworks in environments like that. Grid methods are less
> of interest to us but I'd also like to hear about them. Pure machine
> learning research that might be applied to NLP would also be welcome.
>
> If you're doing or aware of work like this, please let me know.
>
> Many thanks,
> Sean Igo
> University of Utah
> Center for High Performance Computing / Biomedical Informatics Dept.
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list