Soft: GATE 3.1 released

Thierry Hamon thierry.hamon at LIPN.UNIV-PARIS13.FR
Thu Apr 13 19:56:24 UTC 2006


Date: Wed, 12 Apr 2006 15:27:40 +0100
From: Hamish Cunningham <hamish at dcs.shef.ac.uk>
Message-ID: <443D0E5C.2000202 at dcs.shef.ac.uk>
X-url: http://www.research.ibm.com/UIMA/
X-url: http://jena.sourceforge.net/ontology


                     GATE Version 3.1 release (April 2006)

1. Major new features

  1.1. Support for UIMA

   UIMA (http://www.research.ibm.com/UIMA/) is a language processing
   framework developed by IBM. UIMA and GATE share some functionality
   but are complementary in most respects.  GATE now provides an
   interoperability layer to allow UIMA applications to include GATE
   components in their processing and vice-versa. For full
   information, see chapter 14 of the User Guide.

  1.2. New Ontology API

   The ontology layer has been rewritten in order to provide an
   abstraction layer between the model representation and the tools
   used for input and output of the various representation formats. An
   implementation that uses Jena 2
   (http://jena.sourceforge.net/ontology) for reading and writing OWL
   and RDF(S) is provided.

  1.3. Ontotext Japec Compiler

   Japec is a compiler for JAPE grammars developed by Ontotext Lab. It
   has some limitations compared to the standard JAPE transducer
   implementation, but can run JAPE grammars up to five times as
   fast. By default, GATE still uses the stable JAPE implementation,
   but if you want to experiment with Japec, see section 9.27 of the
   User Guide.

2. Other new features and improvements

     * Addition of a new JAPE matching style "all". This is similar to
       Brill, but once all rules from a given start point have
       matched, the matching will continue from the next offset to the
       current one, rather than from the position in the document
       where the longest match finishes. More details can be found in
       Section 7.2.

     * Limited support for loading PDF and Microsoft Word document
       formats.  Only the text is extracted from the documents, no
       formatting information is preserved.

     * The Buchart parser has been deprecated and replaced by a new
       plugin called SUPPLE - the Sheffield University Prolog Parser
       for Language Engineering.  Full details, including information
       on how to move your application from Buchart to SUPPLE, is in
       section 9.12.

     * The Hepple POS Tagger is now open-source. The source code has
       been included in the GATE distribution, under
       src/hepple/postag. More information about the POS Tagger can be
       found in Section 8.4.

     * Minipar is now supported on Windows.  minipar-windows.exe, a
       modified version of pdemo.cpp is added under the
       gate/plugins/minipar directory to allow users to run Minipar on
       windows platform.  While using Minipar on Windows, this binary
       should be provided as a value for miniparBinary parameter. For
       full information on Minipar in GATE, see section 9.10 of the
       User Guide.

     * The XmlGateFormat writer(Save As Xml from GATE GUI,
       gate.Document.toXml() from GATE API) and reader have been
       modified to write and read GATE annotation IDs. For backward
       compatibility reasons the old reader has been kept. This change
       fixes a bug which manifested in the following situation: If a
       GATE document had annotations carrying features of which values
       were numbers representing other GATE annotation IDs, after a
       save and a reload of the document to and from XML, the former
       values of the features could have become invalid by pointing to
       other annotations. By saving and restoring the GATE annotation
       ID, the former consistency of the GATE document is maintained.
       For more information, see Section 6.5.2 of the User Guide.

     * The NP chunker and chemistry tagger plugins have been updated.
       Mark Greenwood has relicenced them under the LGPL, so their
       source code has been moved into the GATE distribution. See
       sections 9.3 and 9.15 for details.

     * The Tree Tagger wrapper has been updated with an option to be
       less strict when characters that cannot be represented in the
       tagger's encoding are encountered in the document. Details are
       in section 9.7.

     * JAPE Transducers can be serialized into binary files. The
       option to load serialized version of JAPE Transducer (an
       init-time parameter binaryGrammarURL) is also implemented which
       can be used as an alternative to the parameter grammarURL. More
       information can be found in Section 7.7.

     * On Mac OS, GATE now behaves more naturally. The application
       menu items and keyboard shortcuts for About and Preferences now
       do what you would expect, and exiting GATE with command-Q or
       the Quit menu item properly saves your options and current
       session.

     * Updated versions of Weka (3.4.6) and Maxent (2.4.0).

     * Optimisation in gate.creole.ml: the conversion of AnnotationSet
       into ML examples is now faster.

     * It is now possible to create your own implementation of
       Annotation, and have GATE use this instead of the default
       implementation. See AnnotationFactory and AnnotationSetImpl in
       the gate.annotation package for details.

3. Bug fixes

     * The Tree Tagger wrapper has been updated in order to run under
       Windows. See 9.7.

     * The SUPPLE parser has been made more user-friendly.  It now
       produces more helpful error messages if things go wrong. Note
       that you will need to update any saved applications that
       include SUPPLE to work with this version - see section 9.12 of
       the User Guide for details.

     * Miscellaneous fixes in the Ontotext JapeC compiler.

     * Optimization : the creation of a Document is much faster.

     * Google plugin: The optional pagesToExclude parameter was
       causing a NullPointerException when left empty at run
       time. Full details about the plugin functionality can be found
       in section 9.20.

     * Minipar, SUPPLE, TreeTagger: These plugins that call external
       processes have been fixed to cope better with path names that
       contain spaces.  Note that some of the external tools
       themselves still have problems handling spaces in file names,
       but these are beyond our control to fix.  If you want to use
       any of these plugins, be sure to read the documentation to see
       if they have any such restrictions.

     * When using a non-default location for GATE configuration files,
       the configuration data is saved back to the correct location
       when GATE exits. Previously the default locations were always
       used.

     * Jape Debugger: ConcurrentModificationException in JAPE
       debugger.  The JAPE debugger was generating a
       ConcurrentModificationException during an attempt to run
       ANNIE. There is no exception when running without the debugger
       enabled. As result of fixing one unnesesary and incorrect
       callback to debugger was removed from SinglePhaseTransducer
       class.

     * Plus many other small bugfixes...



-------------------------------------------------------------------------
Message diffusé par la liste Langage Naturel <LN at cines.fr>
Informations, abonnement : http://www.atala.org/article.php3?id_article=48
English version          : 
Archives                 : http://listes.cines.fr/wws/arc/ln
                           http://listserv.linguistlist.org/archives/ln.html

La liste LN est parrainée par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhésion  : http://www.atala.org/
-------------------------------------------------------------------------



More information about the Ln mailing list