[Corpora-List] Penn Treebank annotated with chunks

Steven Bird sb at csse.unimelb.edu.au
Mon Aug 13 21:58:26 UTC 2012


Aleksandar,

The "tagged" section of Penn Treebank has chunks marked with brackets, e.g.:

[ Pierre/NNP Vinken/NNP ]
,/,
[ 61/CD years/NNS ]
old/JJ ,/, will/MD join/VB
[ the/DT board/NN ]
as/IN
[ a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD ]
./.

The NLTK corpus readers give access to some chunked corpora:
http://nltk.googlecode.com/svn/trunk/doc/howto/corpus.html#chunked-corpora

NLTK doesn't give an interface to the chunked version of the treebank
data, but it could be added if there was interest in this.

-Steven Bird

On 13 August 2012 22:52, Aleksandar Savkov <cytehuop at gmail.com> wrote:
> Hello everybody,
>
> I'm looking for a chunk-annotated version of the Penn Treebank. It seems to
> be the most popular resource for training and testing chunking software, but
> I haven't been able to find a chunked version or an algorithm for extracting
> chunks in a deterministic way. Is there a standard resource that everybody
> uses or does everybody just extract the chunks from the parsed data
> themselves?
>
> Best,
> Aleksandar Savkov
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list