[Corpora-List] statistical named entity recognition

Mari Olsen molsen at microsoft.com
Thu Jan 2 17:53:47 UTC 2003


I am chairing a workshop 12 July 2003, after ACL 2003 (Sapporo) intended to address questions related to multilingual NE recognition and reusability of statistical and symbolic methods across languages.  I encourage you and (and others) to submit a paper to the workshop and/or to attend it.  Here's the relevant info (website to be up by 10 January, and an official CFP to go out via the customary channels: tentative submission deadline 7 March 2003). (Note: Microsoft is providing some travel funds, to help defray expenses for students.)  

Mari Broman Olsen
Natural Language Group
**************************************************************
Title and description:
Multilingual and Mixed-language Named Entity Recognition: Combining Statistical and Symbolic Models

Organizing Committee: 
Kevin Humphreys, Mari Broman Olsen, Joseph Pentheroudakis, Robert Stumberger, Hajime Wada

Description:
Named Entity (NE) Recognition systems vary widely, from high-speed bulk methods optimized for indexing, to deep semantic parsers tuned for specific domains.  Optimal ways to combine statistical and symbolic models also vary, depending on applications and tasks.  Is it possible to 
	-maximize use of knowledge-rich resources (e.g. lexicons, NE grammars, parsing) while permitting corpus-based training for domain or language?
	-acquire and share resources (including lexicons and grammars) across languages?
	-balance performance speed with reasonable accuracy?
	-use specific language patterns while permitting rapid transfer to another language?
	-minimize variability in results across language types?

We welcome research on combined models, in which these tradeoffs are calculated in particular ways.  We hope that the workshop will bring together work on robust and deep multilingual and mixed language NE recognition from different perspectives. Possible topics include
	-the role of the lexicon vs. dynamic processing information
	-grammars and lexicons shared (or ported) across languages
      -acquisition of multilingual resources (e.g. from corpora)
      -translating NEs across multiple languages
	-domain tuning

Papers may cover one or more of these (or related) areas. Demonstrations of implemented NE systems are also welcome.

-------------
Program committee
Roberto Basili (University of Roma Tor Vergata)
Robert Gaizauskas (Sheffield)
Ralph Grishman (New York University)
Lauri Karttunen (Parc, Inc.)
Kevin Knight (ISI)
Gary Geunbae Lee (Pohang University of Science and Technology)
Dekang Lin (University of Alberta)
Boyan Onyshkevich (Department of Defense)
John Prager (IBM)
Jeff Reynar (Microsoft)
Mila Ramos-Santacruz (SRA)
Ellen Riloff (University of Utah)
Beth Sundheim (NCCOSC, San Diego)
Janine Toole (Gavagai Technology)
Benjamin Tsou (City Univ. of Hong Kong)
Marc Vilain (MITRE)
Sornlertlamvanich Virach (Thailand National Electronics and Computer Technology)


-----Original Message-----
From: Åsne Thea Fraser Haaland [mailto:a.t.haaland at ilf.uio.no] 
Sent: Thursday, January 02, 2003 3:45 AM
To: corpora at hd.uib.no
Subject: [Corpora-List] statistical named entity recognition



Hello list members,
My Ph.D. thesis is to be on named entity recognition for Norwegian. I want 
to use existing programming tools implementing different statistical 
methods. Most of my reading has been on maximum entropy modelling. Do any 
of you have any experience with existing tools that can be used for named 
entity recognition? Ideally I would like to be able to experiment with the 
kind of information provided to the system, so I want open source code that 
can be modified. In the case of maximum entropy modelling I would 
appreciate the possibility of trying different algorithms. It would be an 
extra bonus if I could try out the frequency redistibution algorithm 
advocated by Mikheev.
I intend to post a summary of the comments received. I appreciate your help. Best, Åsne Haaland


Åsne Haaland, stipendiat
Tekstlaboratoriet, Inst. for lingvistiske fag (http://www.hf.uio.no/tekstlab) Pb. 1102 Blindern, 0317 Oslo; besøksadr.: rom 523 Henrik Wergelands hus
Tlf.: 22 85 67 87, faks: 22 85 69 19
E-post: a.t.haaland at ilf.uio.no



More information about the Corpora mailing list