[Corpora-List] Perl reader for Treebank parse trees?

Nitin Madnani nmadnani at gmail.com
Fri Apr 14 23:50:42 UTC 2006


Steven beat me to it !! I was just about to post that  I have been
using NLTK for a while now and it has the functionality Philip needs.
May be it's finally time to switch to python, Philip ? :)

Nitin

On 4/14/06, Steven Bird <sb at csse.unimelb.edu.au> wrote:
> On 4/15/06, Philip Resnik <resnik at umiacs.umd.edu> wrote:
> >
> > Does anyone have a convenient perl subroutine or module that will
> > convert Treebank parse trees into internal perl data structures?
>
> Note that NLTK provides this functionality for Python programmers.
> Here's how easy it is to use (for the treebank sample in
> NLTK-Corpora).
>
> --snip--
>   >>> from nltk_lite.corpora import treebank, extract
>   >>> print extract(0, treebank.parsed())
>   (S:
>     (NP-SBJ:
>       (NP: (NNP: 'Pierre') (NNP: 'Vinken'))
>       (,: ',')
>       (ADJP: (NP: (CD: '61') (NNS: 'years')) (JJ: 'old'))
>       (,: ','))
>     (VP:
>       (MD: 'will')
>       (VP:
>         (VB: 'join')
>         (NP: (DT: 'the') (NN: 'board'))
>         (PP-CLR:
>           (IN: 'as')
>           (NP: (DT: 'a') (JJ: 'nonexecutive') (NN: 'director')))
>         (NP-TMP: (NNP: 'Nov.') (CD: '29'))))
>     (.: '.'))
> --snip--
>
> Get NLTK from http://nltk.sourceforge.net/
>
> For those still wedded to Perl for NLP, consider the following Perl
> program to find all words in a text ending in "ing".  Note the
> 'magic', the bits of syntax like <>, (split), my, $, =~, which reduces
> readability:
>
>   while (<>) {
>       foreach my $word (split) {
>           if ($word =~ /ing$/) {
>               print "$word\n";
>           }
>       }
>   }
>
> Here's the Python version, which contains far less magic:
>
>   import sys
>   for line in sys.stdin.readlines():
>       for word in line.split():
>           if word.endswith('ing'):
>               print word
>
> -Steven Bird
>
>


--
Got Blog?
http://greenideas.blogspot.com



More information about the Corpora mailing list