[Corpora-List] English treebank

Ulrik Sandborg-Petersen ulrikp at hum.aau.dk
Thu Apr 3 09:11:37 UTC 2008


Hi Rich,

A subset of the Penn Treebank is freely available under a Creative 
Commons license as part of the NLTK corpus set:

http://nltk.org/index.php/Corpora


You might also want to purchase the BLLIP corpus from the LDC, as it is 
cheaper than the Penn Treebank, and is an automatic parsing of the 
1987-1989 stories from the Wall Street Journal also used for the Penn 
Treebank. It might suit your needs.

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2000T43




Ulrik Sandborg-Petersen




Rich Cooper Elk wrote:
>
> Hi Linguisticians,
>
> I'm looking for a free English tree bank to perform some small 
> experiments on. The Penn Treebank looks like a great one, but it costs 
> $1,000 from the LDC. I’m just experimenting, so I don’t want to fork 
> over that much cash just yet.
>
> Does anyone know of a free annotated Treebank of English text derived 
> from an edited journal or equivalent?
>
> -Rich
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>   


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list