[Corpora-List] Tagset mapping (Negra -> Penn Treebank)

Kevin Duh duh at ee.washington.edu
Tue Dec 27 18:30:40 UTC 2005


Dear Corpora members,

I am interested in comparing the part-of-speech tag distributions of 
English vs. German. Currently, I'm looking into using the WSJ Penn 
Treebank for English, and the Negra and TIGER corpora for German. 
However, the tagsets of WSJ vs. Negra/TIGER are different, so I'm 
wondering if anyone has any mapping that converts from the Negra/TIGER 
tagset to the WSJ tagset?

In other words, is there some document that specifies, to the best 
effort possible, which tags in the Negra tagset 
(http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/stts.asc) 
corresponds to which tags in the WSJ tagset 
(http://www.ldc.upenn.edu/Catalog/docs/treebank2/cl93.html).

Thanks in advance!
Kevin Duh

-------------------------------------------------
Kevin Duh
Dept. of Electrical Engineering
University of Washington
http://ssli.ee.washington.edu/people/duh/



More information about the Corpora mailing list