Livre: S=?ISO-8859-1?Q?=F8gaard=2C_?=Semi-Supervised Learning and Domain Adaptation in NLP

Thierry Hamon thierry.hamon at UNIV-PARIS13.FR
Sun Jun 30 17:22:52 UTC 2013


Date: Wed, 26 Jun 2013 12:10:14 -0400
From: Graeme Hirst <gh at cs.toronto.edu>
Message-Id: <E684D9BC-B9D1-4654-87BA-6A453FB490EA at cs.toronto.edu>
X-url: http://www.morganclaypool.com/doi/abs/10.2200/S00497ED1V01Y201304HLT021


BOOK ANNOUNCEMENT

Semi-Supervised Learning and Domain Adaptation in Natural Language
Processing

by Anders Søgaard, University of Copenhagen

Synthesis Lectures on Human Language Technologies #21 (Morgan & Claypool
Publishers), 2013, x+93 pages

Abstract

This book introduces basic supervised learning algorithms applicable to
natural language processing (NLP) and shows how the performance of these
algorithms can often be improved by exploiting the marginal distribution
of large amounts of unlabeled data. One reason for that is data
sparsity, i.e., the limited amounts of data we have available in
NLP. However, in most real-world NLP applications our labeled data is
also heavily biased. This book introduces extensions of supervised
learning algorithms to cope with data sparsity and different kinds of
sampling bias.

This book is intended to be both readable by first-year students and
interesting to the expert audience. My intention was to introduce what
is necessary to appreciate the major challenges we face in contemporary
NLP related to data sparsity and sampling bias, without wasting too much
time on details about supervised learning algorithms or particular NLP
applications. I use text classification, part-of-speech tagging, and
dependency parsing as running examples, and limit myself to a small set
of cardinal learning algorithms. I have worried less about theoretical
guarantees ("this algorithm never does too badly") than about useful
rules of thumb ("in this case this algorithm may perform really
well"). In NLP, data is so noisy, biased, and non-stationary that few
theoretical guarantees can be established and we are typically left with
our gut feelings and a catalogue of crazy ideas. I hope this book will
provide its readers with both. Throughout the book we include snippets
of Python code and empirical evaluations, when relevant.

Table of Contents: Introduction / Supervised and Unsupervised Prediction
/ Semi-Supervised Learning / Learning under Bias / Learning under
Unknown Bias / Evaluating under Bias

http://www.morganclaypool.com/doi/abs/10.2200/S00497ED1V01Y201304HLT021


This title is available online without charge to members of institutions
that have licensed the Synthesis Digital Library of Engineering and
Computer Science.  Members of licensing institutions have unlimited
access to download, save, and print the PDF without restriction; use of
the book as a course text is encouraged.  To find out whether your
institution is a subscriber, visit
http://www.morganclaypool.com/page/licensed, or just click on the book's
URL above from an institutional IP address and attempt to download the
PDF.  Others may purchase the book from this URL as a PDF download for
US$30 or in print for US$40.  Printed copies are also available from
Amazon and from booksellers worldwide at approximately US$45 or local
currency equivalent.

-------------------------------------------------------------------------
Message diffuse par la liste Langage Naturel <LN at cines.fr>
Informations, abonnement : http://www.atala.org/article.php3?id_article=48
English version       : 
Archives                 : http://listserv.linguistlist.org/archives/ln.html
                                http://liste.cines.fr/info/ln

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  : http://www.atala.org/
-------------------------------------------------------------------------



More information about the Ln mailing list