[Corpora-List] Named Entity Corpora in Dutch

Ivelina Nikolova iva at lml.bas.bg
Thu Nov 8 09:46:21 UTC 2012


Thanks Martin and Mikhail!
I'll be checking out your references.

Ivelina


?? 8.11.2012 ?. 01:28 ?., Mikhail Kozhevnikov ??????:
> Dear Martin,
>
> To my knowledge even the bits already annotated are not available yet, 
> as the data has not been officially released. I've tried to obtain the 
> SRL annotations described in this paper 
> <http://lt3.hogent.be/media/uploads/publications/2012/FinalSRL.pdf> in 
> the end of September and got the following reply:
>
>     The SRL annotations are not part of the second release of the
>     intermediate SoNaR results. The final release will comprise SRL
>     annotations: a 500K corpus that has been automatically labeled and
>     a 500K corpus that has been completely manually verified.
>     We do not know when the final release will be available, since the
>     project is still not officially closed: an evaluation has shown
>     that some alterations need to be made and documentation needs to
>     be added. We can not start distribution before the official ending
>     of the project.
>
>
> I too would be very interested in any new information concerning the 
> release date or (partial) availability of the data.
>
> Regards,
> Mikhail
>
> On Wed, Nov 7, 2012 at 9:28 PM, Martin Reynaert <reynaert at uvt.nl 
> <mailto:reynaert at uvt.nl>> wrote:
>
>     Dear Ivelina,
>
>     For Dutch we now have the SoNaR-500 corpus (currently about 540
>     million word tokens of contemporary written Dutch, automatically
>     annotated) and the SoNaR-1 corpus (about 1 million word tokens of
>     contemporary written Dutch, largely manually annotated for semantics).
>
>     For Named Entity Recognition the Support-Vector Machine tool
>     (called 'NERD' for 'Named Entity Recognition for Dutch', developed
>     at LT3, Ghent University, by Bart Desmet) used to automatically
>     label SoNaR-500 was trained on the NEs manually labeled in SoNaR-1.
>
>     To acquire the corpus, please enquire at the Dutch HLT Agency:
>
>     http://www.inl.nl/tst-centrale/
>
>     The full corpus itself may not be fully available yet, but should
>     be soon, and you can at least sort out the licensing part at this
>     stage. In fact, I am to date curating parts of its metadata.
>
>     Best,
>
>     Martin
>
>
>
>
>
>     On 11/07/2012 06:23 PM, Ivelina Nikolova wrote:
>
>         On 11/07/2012 05:49 PM, Alberto Lavelli wrote:
>
>             The CoNLL 2002 shared task concerned Named Entity
>             Recognition for
>             Spanish and Dutch.
>             You can find information about the CoNLL series here:
>
>             http://ifarm.nl/signll/conll/
>
>             Hope this helps
>
>
>         Thanks Alberto!
>         I got several references to this task corpus especially. It
>         seems to be the most used one.
>
>         Best,
>         Ivelina
>
>
>
>                 alberto
>
>
>             On Wed, Nov 07, 2012 at 04:13:07PM +0200, Ivelina Nikolova
>             wrote:
>
>                 Dear Corpora Members,
>
>                 I am searching for corpora in Dutch with Named Entity
>                 annotations.
>                 I'm interested in Person, Location, Organization and
>                 Event mentions.
>                 Do you have any suggestions on that?
>
>                 Thank you very much!
>                 Ivelina
>
>                 -- 
>                 Ivelina Nikolova
>                 PhD student in Computer Science
>                 Linguistic Modelling Department
>                 Institute of Information and Communication Technologies
>                 Bulgarian Academy of Sciences
>
>
>                 _______________________________________________
>                 UNSUBSCRIBE from this page:
>                 http://mailman.uib.no/options/corpora
>                 Corpora mailing list
>                 Corpora at uib.no <mailto:Corpora at uib.no>
>                 http://mailman.uib.no/listinfo/corpora
>
>
>
>
>
>     _______________________________________________
>     UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>     Corpora mailing list
>     Corpora at uib.no <mailto:Corpora at uib.no>
>     http://mailman.uib.no/listinfo/corpora
>
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121108/ab677bdc/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list