[Corpora-List] Perl reader for Treebank parse trees?

Steven Bird sb at csse.unimelb.edu.au
Fri Apr 14 21:45:05 UTC 2006

On 4/15/06, Philip Resnik <resnik at umiacs.umd.edu> wrote:
> Does anyone have a convenient perl subroutine or module that will
> convert Treebank parse trees into internal perl data structures?

Note that NLTK provides this functionality for Python programmers. 
Here's how easy it is to use (for the treebank sample in

  >>> from nltk_lite.corpora import treebank, extract
  >>> print extract(0, treebank.parsed())
      (NP: (NNP: 'Pierre') (NNP: 'Vinken'))
      (,: ',')
      (ADJP: (NP: (CD: '61') (NNS: 'years')) (JJ: 'old'))
      (,: ','))
      (MD: 'will')
        (VB: 'join')
        (NP: (DT: 'the') (NN: 'board'))
          (IN: 'as')
          (NP: (DT: 'a') (JJ: 'nonexecutive') (NN: 'director')))
        (NP-TMP: (NNP: 'Nov.') (CD: '29'))))
    (.: '.'))

Get NLTK from http://nltk.sourceforge.net/

For those still wedded to Perl for NLP, consider the following Perl
program to find all words in a text ending in "ing".  Note the
'magic', the bits of syntax like <>, (split), my, $, =~, which reduces

  while (<>) {
      foreach my $word (split) {
          if ($word =~ /ing$/) {
              print "$word\n";

Here's the Python version, which contains far less magic:

  import sys
  for line in sys.stdin.readlines():
      for word in line.split():
          if word.endswith('ing'):
              print word

-Steven Bird

More information about the Corpora mailing list