[Corpora-List] Penn Treebank annotated with chunks

Aleksandar Savkov cytehuop at gmail.com
Tue Aug 14 08:33:45 UTC 2012


Thanks, I think Steven's remark about the chunks in the original version is
what I was looking for. I'll just have to find that version of the treebank.

Best,
Alex

On 14 August 2012 00:38, Alexander Yeh <asy at mitre.org> wrote:

> http://www.clips.ua.ac.be/**pages/mbsp-tags<http://www.clips.ua.ac.be/pages/mbsp-tags>
>  - Describes a set of chunk tags and possibly some chunk finding
>    programs
>
> http://www.cnts.ua.ac.be/**conll2000/chunking/<http://www.cnts.ua.ac.be/conll2000/chunking/>
>  - Describes a past CoNLL evaluation on noun and verb chunking.
>    It has some links to data sets based on WSJ as well as a script for
>    generating the data sets from WSJ.
>
> Thanks
> -Alex Yeh
>
>
>
>
> Steven Bird wrote:
>
>> Aleksandar,
>>
>> The "tagged" section of Penn Treebank has chunks marked with brackets,
>> e.g.:
>>
>> [ Pierre/NNP Vinken/NNP ]
>> ,/,
>> [ 61/CD years/NNS ]
>> old/JJ ,/, will/MD join/VB
>> [ the/DT board/NN ]
>> as/IN
>> [ a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD ]
>> ./.
>>
>> The NLTK corpus readers give access to some chunked corpora:
>> http://nltk.googlecode.com/**svn/trunk/doc/howto/corpus.**
>> html#chunked-corpora<http://nltk.googlecode.com/svn/trunk/doc/howto/corpus.html#chunked-corpora>
>>
>> NLTK doesn't give an interface to the chunked version of the treebank
>> data, but it could be added if there was interest in this.
>>
>> -Steven Bird
>>
>> On 13 August 2012 22:52, Aleksandar Savkov <cytehuop at gmail.com> wrote:
>>
>>> Hello everybody,
>>>
>>> I'm looking for a chunk-annotated version of the Penn Treebank. It seems
>>> to
>>> be the most popular resource for training and testing chunking software,
>>> but
>>> I haven't been able to find a chunked version or an algorithm for
>>> extracting
>>> chunks in a deterministic way. Is there a standard resource that
>>> everybody
>>> uses or does everybody just extract the chunks from the parsed data
>>> themselves?
>>>
>>> Best,
>>> Aleksandar Savkov
>>>
>>> ______________________________**_________________
>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/**corpora<http://mailman.uib.no/options/corpora>
>>> Corpora mailing list
>>> Corpora at uib.no
>>> http://mailman.uib.no/**listinfo/corpora<http://mailman.uib.no/listinfo/corpora>
>>>
>>>
>> ______________________________**_________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/**corpora<http://mailman.uib.no/options/corpora>
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/**listinfo/corpora<http://mailman.uib.no/listinfo/corpora>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120814/f73dc322/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list