Thanks, I think Steven's remark about the chunks in the original version is what I was looking for. I'll just have to find that version of the treebank.<div><br></div><div>Best,</div><div>Alex<br><br><div class="gmail_quote">


On 14 August 2012 00:38, Alexander Yeh <span dir="ltr"><<a href="mailto:asy@mitre.org" target="_blank">asy@mitre.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<a href="http://www.clips.ua.ac.be/pages/mbsp-tags" target="_blank">http://www.clips.ua.ac.be/<u></u>pages/mbsp-tags</a><br>

 - Describes a set of chunk tags and possibly some chunk finding<br>

   programs<br>

<br>

<a href="http://www.cnts.ua.ac.be/conll2000/chunking/" target="_blank">http://www.cnts.ua.ac.be/<u></u>conll2000/chunking/</a><br>

 - Describes a past CoNLL evaluation on noun and verb chunking.<br>

   It has some links to data sets based on WSJ as well as a script for<br>

   generating the data sets from WSJ.<br>

<br>

Thanks<br>

-Alex Yeh<div class="HOEnZb"><div class="h5"><br>

<br>

<br>

<br>

Steven Bird wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Aleksandar,<br>

<br>

The "tagged" section of Penn Treebank has chunks marked with brackets, e.g.:<br>

<br>

[ Pierre/NNP Vinken/NNP ]<br>

,/,<br>

[ 61/CD years/NNS ]<br>

old/JJ ,/, will/MD join/VB<br>

[ the/DT board/NN ]<br>

as/IN<br>

[ a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD ]<br>

./.<br>

<br>

The NLTK corpus readers give access to some chunked corpora:<br>

<a href="http://nltk.googlecode.com/svn/trunk/doc/howto/corpus.html#chunked-corpora" target="_blank">http://nltk.googlecode.com/<u></u>svn/trunk/doc/howto/corpus.<u></u>html#chunked-corpora</a><br>

<br>

NLTK doesn't give an interface to the chunked version of the treebank<br>

data, but it could be added if there was interest in this.<br>

<br>

-Steven Bird<br>

<br>

On 13 August 2012 22:52, Aleksandar Savkov <<a href="mailto:cytehuop@gmail.com" target="_blank">cytehuop@gmail.com</a>> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hello everybody,<br>

<br>

I'm looking for a chunk-annotated version of the Penn Treebank. It seems to<br>

be the most popular resource for training and testing chunking software, but<br>

I haven't been able to find a chunked version or an algorithm for extracting<br>

chunks in a deterministic way. Is there a standard resource that everybody<br>

uses or does everybody just extract the chunks from the parsed data<br>

themselves?<br>

<br>

Best,<br>

Aleksandar Savkov<br>

<br>

______________________________<u></u>_________________<br>

UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/<u></u>corpora</a><br>

Corpora mailing list<br>

<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>

<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/<u></u>listinfo/corpora</a><br>

<br>

</blockquote>

<br>

______________________________<u></u>_________________<br>

UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/<u></u>corpora</a><br>

Corpora mailing list<br>

<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>

<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/<u></u>listinfo/corpora</a><br>

<br>

</blockquote>

<br>

<br>

</div></div></blockquote></div><br></div>