Livre: Lin & Dyer, Data-Intensive Text Processing with MapReduce

Thierry Hamon thierry.hamon at UNIV-PARIS13.FR
Tue Jun 22 19:49:52 UTC 2010

Date: Mon, 21 Jun 2010 17:22:32 -0400
From: Graeme Hirst <gh at>
Message-Id: <46047507-7DD8-4279-8115-DB7404DC9759 at>


Data-Intensive Text Processing with MapReduce

Jimmy Lin and Chris Dyer
(University of Maryland)

Synthesis Lectures on Human Language Technologies #7 (Morgan &
Claypool Publishers), 2010, 177 pages


Our world is being revolutionized by data-driven methods: access to
large amounts of data has generated new insights and opened exciting
new opportunities in commerce, science, and computing
applications. Processing the enormous quantities of data necessary for
these advances requires large clusters, making distributed computing
paradigms more crucial than ever. MapReduce is a programming model for
expressing distributed computations on massive datasets and an
execution framework for large-scale data processing on clusters of
commodity servers. The programming model provides an
easy-to-understand abstraction for designing scalable algorithms,
while the execution framework transparently handles many system-level
details, ranging from scheduling to synchronization to fault
tolerance. This book focuses on MapReduce algorithm design, with an
emphasis on text processing algorithms common in natural language
processing, information retrieval, and machine learning. We introduce
the notion of MapReduce design patterns, which represent general
reusable solutions to commonly occurring problems across a variety of
problem domains. This book not only intends to help the reader "think
in MapReduce", but also discusses limitations of the programming model
as well.

Table of Contents: Introduction / MapReduce Basics / MapReduce
Algorithm Design / Inverted Indexing for Text Retrieval / Graph
Algorithms / EM Algorithms for Text Processing / Closing Remarks

This title is available online without charge to members of
institutions that have licensed the Synthesis Digital Library of
Engineering and Computer Science.  Members of licensing institutions
have unlimited access to download, save, and print the PDF without
restriction; use of the book as a course text is encouraged.  To find
out whether your institution is a subscriber, visit
<>, or just click on the
book's URL above from an institutional IP address and attempt to
download the PDF.  Others may purchase the book from this URL as a PDF
download for US$30 or in print for US$40.  Printed copies are also
available from Amazon and from booksellers worldwide at approximately
US$40 or local currency equivalent.

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list