[Corpora-List] Perl reader for Treebank parse trees?

Steven Bird sb at csse.unimelb.edu.au
Fri Apr 14 21:45:05 UTC 2006


On 4/15/06, Philip Resnik <resnik at umiacs.umd.edu> wrote:
>
> Does anyone have a convenient perl subroutine or module that will
> convert Treebank parse trees into internal perl data structures?

Note that NLTK provides this functionality for Python programmers. 
Here's how easy it is to use (for the treebank sample in
NLTK-Corpora).

--snip--
  >>> from nltk_lite.corpora import treebank, extract
  >>> print extract(0, treebank.parsed())
  (S:
    (NP-SBJ:
      (NP: (NNP: 'Pierre') (NNP: 'Vinken'))
      (,: ',')
      (ADJP: (NP: (CD: '61') (NNS: 'years')) (JJ: 'old'))
      (,: ','))
    (VP:
      (MD: 'will')
      (VP:
        (VB: 'join')
        (NP: (DT: 'the') (NN: 'board'))
        (PP-CLR:
          (IN: 'as')
          (NP: (DT: 'a') (JJ: 'nonexecutive') (NN: 'director')))
        (NP-TMP: (NNP: 'Nov.') (CD: '29'))))
    (.: '.'))
--snip--

Get NLTK from http://nltk.sourceforge.net/

For those still wedded to Perl for NLP, consider the following Perl
program to find all words in a text ending in "ing".  Note the
'magic', the bits of syntax like <>, (split), my, $, =~, which reduces
readability:

  while (<>) {
      foreach my $word (split) {
          if ($word =~ /ing$/) {
              print "$word\n";
          }
      }
  }

Here's the Python version, which contains far less magic:

  import sys
  for line in sys.stdin.readlines():
      for word in line.split():
          if word.endswith('ing'):
              print word

-Steven Bird



More information about the Corpora mailing list