[Corpora-List] statistical named entity recognition

Jose Maria Gomez Hidalgo jmgomez at dinar.esi.uem.es
Tue Jan 7 13:24:03 UTC 2003


At 12:45 02/01/2003 +0100, you wrote:

>Hello list members,
>My Ph.D. thesis is to be on named entity recognition for Norwegian. I want 
>to use existing programming tools implementing different statistical 
>methods. Most of my reading has been on maximum entropy modelling. Do any 
>of you have any experience with existing tools that can be used for named 
>entity recognition?

No experience but a couple of references:

[Chieu02] Chieu, Hai Leong, & Ng, Hwee Tou (2002). Named Entity Recognition:
A Maximum Entropy Approach Using Global Information. Proceedings of
the 19th International Conference on Computational Linguistics (COLING
2002). (pp. 190-196). Taipei, Taiwan.

[Chieu02b] Chieu, Hai Leong, & Ng, Hwee Tou (2002). Teaching a Weaker
Classifier: Named Entity Recognition on Upper Case Text. Proceedings
of the 40th Annual Meeting of the Association for Computational
Linguistics (ACL-02). (pp. 481-488). Philadelphia, Pennsylvania, USA.

Also, there may be some papers in the CONLL'02 workshop. The shared task track
was focused on Language-Independent Named Entity Recognition, and the web page
with papers, results, and training and testing data for Spanish and Dutch is:

http://cnts.uia.ac.be/conll2002/ner/

>Ideally I would like to be able to experiment with the kind of information 
>provided to the system, so I want open source code that can be modified. 
>In the case of maximum entropy modelling I would appreciate the 
>possibility of trying different algorithms.

Tha package used by Chieu et al. is maxent 
(http://maxent.sourceforge.net/), a part of the OpenNLP project 
(http://opennlp.sourceforge.net/); it is opensource, in Java, and it has 
been used for developing several classifiers in the Grok package 
(http://grok.sourceforge.net/), including a POS tagger and a Name Finder 
for English.

>It would be an extra bonus if I could try out the frequency redistibution 
>algorithm advocated by Mikheev.
>I intend to post a summary of the comments received. I appreciate your help.
>Best, Åsne Haaland
>
>
>Åsne Haaland, stipendiat
>Tekstlaboratoriet, Inst. for lingvistiske fag (http://www.hf.uio.no/tekstlab)
>Pb. 1102 Blindern, 0317 Oslo; besøksadr.: rom 523 Henrik Wergelands hus
>Tlf.: 22 85 67 87, faks: 22 85 69 1
>E-post: a.t.haaland at ilf.uio.no
>
>



_______________________________________________________________________________

Jose Maria Gomez Hidalgo
Departamento de Inteligencia Artificial
Universidad Europea de Madrid
28670 - Villaviciosa de Odon - MADRID
(+34) 912115670
jmgomez at dinar.esi.uem.es
http://www.esi.uem.es/~jmgomez/
_______________________________________________________________________________

La legislación española ampara el secreto de las comunicaciones. Este 
correo electrónico es estrictamente confidencial y va dirigido 
exclusivamente a su destinatario/a. Si no es Ud., le rogamos que no difunda 
ni copie la transmisión y nos lo notifique cuanto antes.

Spanish law guarantees privacy in electronic communications. This 
electronic transmission is strictly confidential and intended solely for 
the addressee. If you are not the intended addressee, you are kindly 
requested not to disclose nor to copy this transmission and to notify us as 
soon as possible.



More information about the Corpora mailing list