[Corpora-List] Perl reader for Treebank parse trees?
Nitin Madnani
nmadnani at gmail.com
Fri Apr 14 23:50:42 UTC 2006
Steven beat me to it !! I was just about to post that I have been
using NLTK for a while now and it has the functionality Philip needs.
May be it's finally time to switch to python, Philip ? :)
Nitin
On 4/14/06, Steven Bird <sb at csse.unimelb.edu.au> wrote:
> On 4/15/06, Philip Resnik <resnik at umiacs.umd.edu> wrote:
> >
> > Does anyone have a convenient perl subroutine or module that will
> > convert Treebank parse trees into internal perl data structures?
>
> Note that NLTK provides this functionality for Python programmers.
> Here's how easy it is to use (for the treebank sample in
> NLTK-Corpora).
>
> --snip--
> >>> from nltk_lite.corpora import treebank, extract
> >>> print extract(0, treebank.parsed())
> (S:
> (NP-SBJ:
> (NP: (NNP: 'Pierre') (NNP: 'Vinken'))
> (,: ',')
> (ADJP: (NP: (CD: '61') (NNS: 'years')) (JJ: 'old'))
> (,: ','))
> (VP:
> (MD: 'will')
> (VP:
> (VB: 'join')
> (NP: (DT: 'the') (NN: 'board'))
> (PP-CLR:
> (IN: 'as')
> (NP: (DT: 'a') (JJ: 'nonexecutive') (NN: 'director')))
> (NP-TMP: (NNP: 'Nov.') (CD: '29'))))
> (.: '.'))
> --snip--
>
> Get NLTK from http://nltk.sourceforge.net/
>
> For those still wedded to Perl for NLP, consider the following Perl
> program to find all words in a text ending in "ing". Note the
> 'magic', the bits of syntax like <>, (split), my, $, =~, which reduces
> readability:
>
> while (<>) {
> foreach my $word (split) {
> if ($word =~ /ing$/) {
> print "$word\n";
> }
> }
> }
>
> Here's the Python version, which contains far less magic:
>
> import sys
> for line in sys.stdin.readlines():
> for word in line.split():
> if word.endswith('ing'):
> print word
>
> -Steven Bird
>
>
--
Got Blog?
http://greenideas.blogspot.com
More information about the Corpora
mailing list