[Corpora-List] Treebank 2 and 3

Ann Bies bies at ldc.upenn.edu
Thu Oct 19 00:16:28 UTC 2006


Dear Don,

As I recall, the parsed portions of Brown that were included in the
Treebank 2 release were not in the Treebank II/PTB-2 annotation style
(i.e., it was data and annotation in the older style that was repeated
from the earlier Treebank 1 release, with some technical errors fixed,
but not reparsed in the newer annotation style).

The portions of Brown that were included in the Treebank 3 release were
newly annotated/parsed in the newer, more detailed Treebank II
annotation style, but time and budget constraints prevented the
re-annotation of the entire previous Brown corpus with the Treebank II
style -- so only the portion of Brown that had been re-annotated in the
new style was released in Treebank 3.

Please do not hesitate to contact me if you have any further questions.

Thanks,

Ann

Ann Bies
Linguistic Data Consortium
bies at ldc.upenn.edu


Donald E Hardy wrote:
> 
> I'm doing some work with Treebank 3, especially with the parsed
> Switchboard and the parsed portions of Brown that are included in
> Treebank 3.  I noticed today that Treebank 2 has all of the Brown parsed
> texts while Treebank 3 has only some of the Brown parsed texts.  Does
> anyone know why Treebank 3 includes only some of the parsed Brown texts
> while Treebank 2 includes them all?
> 
> Many thanks,
> 
> Don
> 
> Donald E. Hardy
>    Professor
> Department of English/098
> University of Nevada, Reno
> Reno, Nevada 89557
> DonHardy at unr.edu
> http://textant.engl.unr.edu



More information about the Corpora mailing list