Seminaire: INFOLINGU, Cristian Martinez, 5 decembre 2011, LIGM, Universite Paris-Est

Wed Nov 30 15:07:42 UTC 2011

Date: Tue, 29 Nov 2011 10:17:32 +0100
From: Myriam RAKHO <rakho.myriam at gmail.com>
Message-ID: <CAOY-MwuGzeKk+1AGZttSE1uScSKQCM=VtWaX+YgQGyd1P6aW=w at mail.gmail.com>

************************************************************************
*  INFOLINGU*
    Le séminaire de l'équipe Informatique Linguistique
    du Laboratoire d'Informatique Gaspard Monge (LIGM)
    Université Paris-Est Marne-la-Vallée
************************************************************************

   Date : *Le Lundi 5 décembre 2011 à 10h30
*

   Lieu : Université Paris-Est Marne-la-Vallée
          Bâtiment Copernic, 4ème étage, salle de séminaires 4B08R

    *Toute personne intéressée est la bienvenue.*

************************************************************************

------------------------------------------------------------------------
  Intervenant :
------------------------------------------------------------------------

   *Cristian MARTINEZ* (ESIEE/IFRIS)

------------------------------------------------------------------------
   Titre de la présentation :
------------------------------------------------------------------------

*   Perspectives for Modeling and Automatically Processing Multi-source
    Textual *
*   Information Derived from Scientific and Technical Databases*

------------------------------------------------------------------------
   Résumé :
------------------------------------------------------------------------

How efficient is modeling and automatically processing multi-source
scientific and technical information mediated by a large set of
documents ? Scientific and technical text analysis has been receiving
rising attention within the social sciences through an increasing amount
of text in electronic format and the explosion of digital
databases/libraries. This textual data may principally come from
articles and patents, but also from specialized databases such as
financial and scientific projects databases, economics news, surveys,
and far more from bibliographic websites or the blogosphere.  In order
to allow efficient access and use of this information, several
challenges must be overcome: at an organizational level it is necessary
to constitute work teams, policies and agreements, and to facilitate the
access to information collected and produced. At a technical level, the
approach to how to process heterogeneous textual data should be
discussed, along with other aspects, such as the treatment of
large-scale corpus, reduction of noise contained, possible duplication,
multilingualism, and several further computer/user processing tasks. But
where should we start? The automatic processing of multi-source
scientific and technical information involves various computer sciences
disciplines: data & knowledge engineering, text mining, natural language
processing, information retrieval and visualization, or software
ergonomics. To begin with, it is necessary to propose a sort of ‘meeting
point ’: a framework where to bring together these disciplines in a
focused way. Unfortunately, the heterogeneous and dynamic nature of the
information, does not make that task easier. In this talk, we will
present an approach to gathering, modeling, and preserving large-scale
textual information, by linking bits of information, normalizing them
and enriching this data.  We are going to talk about an open source
modular framework (in pre-alpha developement), called Scilmarin,
designed to allow the automatic processing of large-scale multi-source
textual information derived from scientific and technical
databases. Also, we will present XML, a draft specification to model
scientific and technical data. Finally, we will explore the possibility
of Scilmarin to assume tasks involving automatic language processing,
using other software tools such as Unitex.

------------------------------------------------------------------------

-------------------------------------------------------------------------
Message diffuse par la liste Langage Naturel <LN at cines.fr>
Informations, abonnement : http://www.atala.org/article.php3?id_article=48
English version       : 
Archives                 : http://listserv.linguistlist.org/archives/ln.html
                                http://liste.cines.fr/info/ln

La liste LN est parrainee par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhesion  : http://www.atala.org/
-------------------------------------------------------------------------