[Corpora-List] statistical named entity recognition
Jose Maria Gomez Hidalgo
jmgomez at dinar.esi.uem.es
Tue Jan 7 13:24:03 UTC 2003
At 12:45 02/01/2003 +0100, you wrote:
>Hello list members,
>My Ph.D. thesis is to be on named entity recognition for Norwegian. I want
>to use existing programming tools implementing different statistical
>methods. Most of my reading has been on maximum entropy modelling. Do any
>of you have any experience with existing tools that can be used for named
>entity recognition?
No experience but a couple of references:
[Chieu02] Chieu, Hai Leong, & Ng, Hwee Tou (2002). Named Entity Recognition:
A Maximum Entropy Approach Using Global Information. Proceedings of
the 19th International Conference on Computational Linguistics (COLING
2002). (pp. 190-196). Taipei, Taiwan.
[Chieu02b] Chieu, Hai Leong, & Ng, Hwee Tou (2002). Teaching a Weaker
Classifier: Named Entity Recognition on Upper Case Text. Proceedings
of the 40th Annual Meeting of the Association for Computational
Linguistics (ACL-02). (pp. 481-488). Philadelphia, Pennsylvania, USA.
Also, there may be some papers in the CONLL'02 workshop. The shared task track
was focused on Language-Independent Named Entity Recognition, and the web page
with papers, results, and training and testing data for Spanish and Dutch is:
http://cnts.uia.ac.be/conll2002/ner/
>Ideally I would like to be able to experiment with the kind of information
>provided to the system, so I want open source code that can be modified.
>In the case of maximum entropy modelling I would appreciate the
>possibility of trying different algorithms.
Tha package used by Chieu et al. is maxent
(http://maxent.sourceforge.net/), a part of the OpenNLP project
(http://opennlp.sourceforge.net/); it is opensource, in Java, and it has
been used for developing several classifiers in the Grok package
(http://grok.sourceforge.net/), including a POS tagger and a Name Finder
for English.
>It would be an extra bonus if I could try out the frequency redistibution
>algorithm advocated by Mikheev.
>I intend to post a summary of the comments received. I appreciate your help.
>Best, Åsne Haaland
>
>
>Åsne Haaland, stipendiat
>Tekstlaboratoriet, Inst. for lingvistiske fag (http://www.hf.uio.no/tekstlab)
>Pb. 1102 Blindern, 0317 Oslo; besøksadr.: rom 523 Henrik Wergelands hus
>Tlf.: 22 85 67 87, faks: 22 85 69 1
>E-post: a.t.haaland at ilf.uio.no
>
>
_______________________________________________________________________________
Jose Maria Gomez Hidalgo
Departamento de Inteligencia Artificial
Universidad Europea de Madrid
28670 - Villaviciosa de Odon - MADRID
(+34) 912115670
jmgomez at dinar.esi.uem.es
http://www.esi.uem.es/~jmgomez/
_______________________________________________________________________________
La legislación española ampara el secreto de las comunicaciones. Este
correo electrónico es estrictamente confidencial y va dirigido
exclusivamente a su destinatario/a. Si no es Ud., le rogamos que no difunda
ni copie la transmisión y nos lo notifique cuanto antes.
Spanish law guarantees privacy in electronic communications. This
electronic transmission is strictly confidential and intended solely for
the addressee. If you are not the intended addressee, you are kindly
requested not to disclose nor to copy this transmission and to notify us as
soon as possible.
More information about the Corpora
mailing list