18.2213, Diss: Comp Ling: Schaefer: 'Integrating Deep and Shallow Natural La...'

LINGUIST Network linguist at LINGUISTLIST.ORG
Mon Jul 23 17:44:42 UTC 2007


LINGUIST List: Vol-18-2213. Mon Jul 23 2007. ISSN: 1068 - 4875.

Subject: 18.2213, Diss: Comp Ling: Schaefer: 'Integrating Deep and Shallow Natural La...'

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
 
Reviews: Randall Eggert, U of Utah  
         <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, 
and donations from subscribers and publishers.

Editor for this issue: Hunter Lockwood <hunter at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================  

1)
Date: 23-Jul-2007
From: Ulrich Schaefer < ulrich.schaefer at dfki.de >
Subject: Integrating Deep and Shallow Natural Language Processing Components: Representations and hybrid architectures

 

	
-------------------------Message 1 ---------------------------------- 
Date: Mon, 23 Jul 2007 13:42:40
From: Ulrich Schaefer [ulrich.schaefer at dfki.de]
Subject: Integrating Deep and Shallow Natural Language Processing Components: Representations and hybrid architectures
E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=18-2213.html&submissionid=152084&topicid=14&msgnumber=1
  


Institution: Saarland University 
Program: Department of Computer Science 
Dissertation Status: Completed 
Degree Date: 2006 

Author: Ulrich Schaefer

Dissertation Title: Integrating Deep and Shallow Natural Language Processing
Components: Representations and hybrid architectures 

Linguistic Field(s): Computational Linguistics


Dissertation Director(s):
Hans Uszkoreit
Wolfgang Wahlster

Dissertation Abstract:

We describe basic concepts and software architectures for the integration
of shallow and deep (linguistics-based, semantics-oriented) natural
language processing (NLP) components. The main goal of this novel, hybrid
integration paradigm is improving robustness of deep processing.

After an introduction to constraint-based natural language parsing, we give
an overview of typical shallow processing tasks. We introduce XML standoff
markup as an additional abstraction layer that eases integration of NLP
components, and propose the use of XSLT as a standardized and efficient
transformation language for online NLP integration.

In the main part of the thesis, we describe our contributions to three
hybrid architecture frameworks that make use of these fundamentals. SProUT
is a shallow system that uses elements of deep constraint-based processing,
namely type hierarchy and typed feature structures. Whiteboard is the first
hybrid architecture to integrate not only part-of-speech tagging, but also
named entity recognition and topological parsing, with deep parsing.
Finally, we present Heart of Gold, a middleware architecture that
generalizes Whiteboard into various dimensions such as configurability,
multilinguality and flexible processing strategies.

We describe various applications that have been implemented using the
hybrid frameworks such as structured named entity recognition, information
extraction, creative document authoring support, deep question analysis, as
well as evaluations. In Whiteboard, e.g., it could be shown that shallow
pre-processing increases both coverage and efficiency of deep parsing by a
factor of more than two.

Heart of Gold not only forms the basis for applications that utilize
semantics-oriented natural language analysis, but also constitutes a
complex research instrument for experimenting with novel processing
strategies combining deep and shallow methods, and eases replication and
comparability of results. 





-----------------------------------------------------------
LINGUIST List: Vol-18-2213	

	



More information about the LINGUIST mailing list