Dear Martin,<br><br>To my knowledge even the bits already annotated are not available yet, as the data has not been officially released. I've tried to obtain the SRL annotations described in <a href="http://lt3.hogent.be/media/uploads/publications/2012/FinalSRL.pdf" target="_blank">this paper</a> in the end of September and got the following reply:<br>
<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">The SRL annotations are not part of the second release of the intermediate SoNaR results. The final release will comprise SRL annotations: a 500K corpus that has been automatically labeled and a 500K corpus that has been completely manually verified.<br>
We do not know when the final release will be available, since the project is still not officially closed: an evaluation has shown that some alterations need to be made and documentation needs to be added. We can not start distribution before the official ending of the project.</blockquote>
<div><br></div><div>I too would be very interested in any new information concerning the release date or (partial) availability of the data.<br><br>Regards,<br>Mikhail</div><div> </div><div class="gmail_extra"><br><div class="gmail_quote">
On Wed, Nov 7, 2012 at 9:28 PM, Martin Reynaert <span dir="ltr"><<a href="mailto:reynaert@uvt.nl" target="_blank">reynaert@uvt.nl</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear Ivelina,<br>
<br>
For Dutch we now have the SoNaR-500 corpus (currently about 540 million word tokens of contemporary written Dutch, automatically annotated) and the SoNaR-1 corpus (about 1 million word tokens of contemporary written Dutch, largely manually annotated for semantics).<br>
<br>
For Named Entity Recognition the Support-Vector Machine tool (called 'NERD' for 'Named Entity Recognition for Dutch', developed at LT3, Ghent University, by Bart Desmet) used to automatically label SoNaR-500 was trained on the NEs manually labeled in SoNaR-1.<br>
<br>
To acquire the corpus, please enquire at the Dutch HLT Agency:<br>
<br>
<a href="http://www.inl.nl/tst-centrale/" target="_blank">http://www.inl.nl/tst-<u></u>centrale/</a><br>
<br>
The full corpus itself may not be fully available yet, but should be soon, and you can at least sort out the licensing part at this stage. In fact, I am to date curating parts of its metadata.<br>
<br>
Best,<br>
<br>
Martin<div><div><br>
<br>
<br>
<br>
<br>
On 11/07/2012 06:23 PM, Ivelina Nikolova wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On 11/07/2012 05:49 PM, Alberto Lavelli wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
The CoNLL 2002 shared task concerned Named Entity Recognition for<br>
Spanish and Dutch.<br>
You can find information about the CoNLL series here:<br>
<br>
<a href="http://ifarm.nl/signll/conll/" target="_blank">http://ifarm.nl/signll/conll/</a><br>
<br>
Hope this helps<br>
</blockquote>
<br>
Thanks Alberto!<br>
I got several references to this task corpus especially. It seems to be the most used one.<br>
<br>
Best,<br>
Ivelina<br>
<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
alberto<br>
<br>
<br>
On Wed, Nov 07, 2012 at 04:13:07PM +0200, Ivelina Nikolova wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Dear Corpora Members,<br>
<br>
I am searching for corpora in Dutch with Named Entity annotations.<br>
I'm interested in Person, Location, Organization and Event mentions.<br>
Do you have any suggestions on that?<br>
<br>
Thank you very much!<br>
Ivelina<br>
<br>
-- <br>
Ivelina Nikolova<br>
PhD student in Computer Science<br>
Linguistic Modelling Department<br>
Institute of Information and Communication Technologies<br>
Bulgarian Academy of Sciences<br>
<br>
<br>
______________________________<u></u>_________________<br>
UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/<u></u>corpora</a><br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/<u></u>listinfo/corpora</a><br>
</blockquote></blockquote>
<br>
<br>
</blockquote>
<br>
<br>
______________________________<u></u>_________________<br>
UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/<u></u>corpora</a><br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/<u></u>listinfo/corpora</a><br>
</div></div></blockquote></div><br></div>