[Corpora-List] Any research on long named-entities

Alexandre Rafalovitch arafalov at gmail.com
Tue Dec 9 02:26:21 UTC 2008


Hello,

I am looking for any research on recognizing long named entities
(mostly organisational bodies). When I say long, I mean 10-20 tokens
in length, rather than more frequently discussed 5-7. A short-ish
example of such a name would be "the United Nations Educational ,
Scientific and Cultural Organization". Yes, that's names with commas,
conjunctions and other tokens that are normally excluded.

I suspect legal and biological domains would be closest in their need,
but so far I have failed to find an especially relevant paper.

I have found some interesting papers in the Automatic Term Recognition
domain. Unfortunately, it does not look like ATR assumptions work well
when the parts of the named entities occur as parts of other named
entities and when full names can get shortened in non-trivial way
(e.g. "the Advisory Committee on Administrative and Budgetary
Questions  " => "the Committee").

Any pointer would be appreciated. I will summarise the responses.

Regards,
    Alex.
Personal blog: http://blog.outerthoughts.com/
Research group: http://www.clt.mq.edu.au/Research/

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list