Soft: Tgrep2

alexis nasr alexis.nasr at
Wed May 30 14:43:53 UTC 2001

The readers of this list may be interested in a new tool, tgrep2, that I
have developed for searching parsed corpora such as those included in
the Penn Treebank.

As the name might suggest, tgrep2 is based on tgrep and is largely
backward compatible.  However, tgrep2 adds a number of new features,
including the following major enhancements:

 * Rather than simply having a set of required relationships and a set of
   prohibited relationships, nodes can have full boolean expressions of
   relationships to other nodes.
 * Nodes can be given unique labels and may then be referred to by those
   labels in the pattern specification or in selecting trees for printing.
 * Patterns are no longer restricted to simple tree architectures. The use
   of node labels and segmented patterns allows links in a pattern to form
   back-edges as well, permitting cycles of links.
 * Customizable output formats allow a variety of information to be
   reported in a flexible manner.
 * Multiple search patterns may be specified and one can retrieve the
   first subtree matching any pattern, the first subtree matching each
   pattern, or all subtrees matching all patterns.
 * Subtrees can be reported using a code rather than by printing the
   whole structure. The trees themselves can later be retrieved using the
 * A variety of new links have been added and the immediately-precedes
   link now has a more conventional meaning.
 * Tgrep2 corpus files are substantially smaller than tgrep corpora.

More information and the tgrep2 software can be found at the following

Doug Rohde
Carnegie Mellon University
Message diffusé par la liste Langage Naturel <LN at>
Informations, abonnement :
English version          :
Archives                 :

La liste LN est parrainée par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhésion  :

More information about the Ln mailing list