[Corpora-List] statistical named entity recognition

>Hello list members,
>My Ph.D. thesis is to be on named entity recognition for Norwegian. I want 
>to use existing programming tools implementing different statistical 
>methods. Most of my reading has been on maximum entropy modelling. Do any 
>of you have any experience with existing tools that can be used for named 
>entity recognition?

No experience but a couple of references:

[Chieu02] Chieu, Hai Leong, & Ng, Hwee Tou (2002). Named Entity Recognition:
A Maximum Entropy Approach Using Global Information. Proceedings of
the 19th International Conference on Computational Linguistics (COLING
2002). (pp. 190-196). Taipei, Taiwan.

[Chieu02b] Chieu, Hai Leong, & Ng, Hwee Tou (2002). Teaching a Weaker
Classifier: Named Entity Recognition on Upper Case Text. Proceedings
of the 40th Annual Meeting of the Association for Computational
Linguistics (ACL-02). (pp. 481-488). Philadelphia, Pennsylvania, USA.

Also, there may be some papers in the CONLL'02 workshop. The shared task track
was focused on Language-Independent Named Entity Recognition, and the web page
with papers, results, and training and testing data for Spanish and Dutch is:


>Ideally I would like to be able to experiment with the kind of information 
>provided to the system, so I want open source code that can be modified. 
>In the case of maximum entropy modelling I would appreciate the 
>possibility of trying different algorithms.

Tha package used by Chieu et al. is maxent 
(http://maxent.sourceforge.net/), a part of the OpenNLP project 
(http://opennlp.sourceforge.net/); it is opensource, in Java, and it has 
been used for developing several classifiers in the Grok package 
(http://grok.sourceforge.net/), including a POS tagger and a Name Finder 
for English.

>It would be an extra bonus if I could try out the frequency redistibution 
>algorithm advocated by Mikheev.
>I intend to post a summary of the comments received. I appreciate your help.
>Best, Åsne Haaland
>Åsne Haaland, stipendiat
>Tekstlaboratoriet, Inst. for lingvistiske fag (http://www.hf.uio.no/tekstlab)
>Pb. 1102 Blindern, 0317 Oslo; besøksadr.: rom 523 Henrik Wergelands hus
>Tlf.: 22 85 67 87, faks: 22 85 69 1
>E-post: a.t.haaland at ilf.uio.no


