[Corpora-List] Tag-set conversion

Timothy Baldwin tbaldwin at csli.Stanford.EDU
Fri Jan 31 01:32:42 UTC 2003


> Does anybody know of an existing tool to translate between the BNC C5
> tag-set and the Penn Tree Bank tag-set?

Assuming you are running Solaris or Linux, you could use the tools supplied
with cass, as developed by Marc Light and Steve Abney:

http://whorf.sfs.nphil.uni-tuebingen.de/~abney/scol1e.tar.gz

Their use is documented in the cass manual supplied in the tarball, but for
the record, you run:

 bncsents BNCFILE | tagfixes -f bnc.fxc

where BNCFILE is a BNC source file.

You could alternatively just retag the BNC using a Penn-style tagger, of
course, given that the BNC data was for the most part automatically tagged.


Tim



More information about the Corpora mailing list