Seminaire: Statistical Topic Modeling of Large Text Corpora, 19 avril 2013, Paris

Thierry Hamon thierry.hamon at UNIV-PARIS13.FR
Tue Apr 16 19:48:12 UTC 2013

Date: Tue, 16 Apr 2013 16:05:53 +0200
From: diffuse at
Message-ID: <mlcptt.st9wwi at>

Statistical Topic Modeling of Large Text Corpora

Abstract: Statistical topic models (also known as latent Dirichlet
allocation models) provide a flexible framework for extracting
interpretable descriptions of large corpora of text documents. This talk
will begin by reviewing the basic principles of topic models and discuss
how these models are related to other approaches such as latent semantic
analysis, matrix factorization techniques, and document clustering. We
will illustrate how topic models can be used to address problems such as
generating high-level summaries of document collections and
automatically uncovering thematic trends in a corpus over time.  The
talk will also discuss recent extensions of topic modeling techniques
such as using topic models for document classification and scalable
algorithms for large corpora. Time permitting, we will also discuss how
these types of models can be applied to data with relational
information, such as social network data involving text content. A
number of different text data sets will be used during the talk as
illustrative examples, including news articles, historical newspaper
records, scientific publications, and collections of email data.

Speaker: Padhraic Smyth, Department of Computer Science, University of
California, Irvine

Place: Institut des Systèmes Complexes - Paris Île-de-France - 57-59 rue
Lhomond, 75005 Paris

Date: April 19th - 14:30.

Message diffuse par la liste Langage Naturel <LN at>
Informations, abonnement :
English version       : 
Archives                 :

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  :

More information about the Ln mailing list