[Corpora-List] Named Entity Corpora in Dutch

Martin Reynaert reynaert at uvt.nl
Wed Nov 7 20:28:58 UTC 2012


Dear Ivelina,

For Dutch we now have the SoNaR-500 corpus (currently about 540 million 
word tokens of contemporary written Dutch, automatically annotated) and 
the SoNaR-1 corpus (about 1 million word tokens of contemporary written 
Dutch, largely manually annotated for semantics).

For Named Entity Recognition the Support-Vector Machine tool (called 
'NERD' for 'Named Entity Recognition for Dutch', developed at LT3, Ghent 
University, by Bart Desmet) used to automatically label SoNaR-500 was 
trained on the NEs manually labeled in SoNaR-1.

To acquire the corpus, please enquire at the Dutch HLT Agency:

http://www.inl.nl/tst-centrale/

The full corpus itself may not be fully available yet, but should be 
soon, and you can at least sort out the licensing part at this stage. In 
fact, I am to date curating parts of its metadata.

Best,

Martin




On 11/07/2012 06:23 PM, Ivelina Nikolova wrote:
> On 11/07/2012 05:49 PM, Alberto Lavelli wrote:
>> The CoNLL 2002 shared task concerned Named Entity Recognition for
>> Spanish and Dutch.
>> You can find information about the CoNLL series here:
>>
>> http://ifarm.nl/signll/conll/
>>
>> Hope this helps
>
> Thanks Alberto!
> I got several references to this task corpus especially. It seems to 
> be the most used one.
>
> Best,
> Ivelina
>
>
>>
>>     alberto
>>
>>
>> On Wed, Nov 07, 2012 at 04:13:07PM +0200, Ivelina Nikolova wrote:
>>> Dear Corpora Members,
>>>
>>> I am searching for corpora in Dutch with Named Entity annotations.
>>> I'm interested in Person, Location, Organization and Event mentions.
>>> Do you have any suggestions on that?
>>>
>>> Thank you very much!
>>> Ivelina
>>>
>>> -- 
>>> Ivelina Nikolova
>>> PhD student in Computer Science
>>> Linguistic Modelling Department
>>> Institute of Information and Communication Technologies
>>> Bulgarian Academy of Sciences
>>>
>>>
>>> _______________________________________________
>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>> Corpora mailing list
>>> Corpora at uib.no
>>> http://mailman.uib.no/listinfo/corpora
>
>


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list