Soft: GATE 3.1 released
Thierry Hamon
thierry.hamon at LIPN.UNIV-PARIS13.FR
Thu Apr 13 19:56:24 UTC 2006
Date: Wed, 12 Apr 2006 15:27:40 +0100
From: Hamish Cunningham <hamish at dcs.shef.ac.uk>
Message-ID: <443D0E5C.2000202 at dcs.shef.ac.uk>
X-url: http://www.research.ibm.com/UIMA/
X-url: http://jena.sourceforge.net/ontology
GATE Version 3.1 release (April 2006)
1. Major new features
1.1. Support for UIMA
UIMA (http://www.research.ibm.com/UIMA/) is a language processing
framework developed by IBM. UIMA and GATE share some functionality
but are complementary in most respects. GATE now provides an
interoperability layer to allow UIMA applications to include GATE
components in their processing and vice-versa. For full
information, see chapter 14 of the User Guide.
1.2. New Ontology API
The ontology layer has been rewritten in order to provide an
abstraction layer between the model representation and the tools
used for input and output of the various representation formats. An
implementation that uses Jena 2
(http://jena.sourceforge.net/ontology) for reading and writing OWL
and RDF(S) is provided.
1.3. Ontotext Japec Compiler
Japec is a compiler for JAPE grammars developed by Ontotext Lab. It
has some limitations compared to the standard JAPE transducer
implementation, but can run JAPE grammars up to five times as
fast. By default, GATE still uses the stable JAPE implementation,
but if you want to experiment with Japec, see section 9.27 of the
User Guide.
2. Other new features and improvements
* Addition of a new JAPE matching style "all". This is similar to
Brill, but once all rules from a given start point have
matched, the matching will continue from the next offset to the
current one, rather than from the position in the document
where the longest match finishes. More details can be found in
Section 7.2.
* Limited support for loading PDF and Microsoft Word document
formats. Only the text is extracted from the documents, no
formatting information is preserved.
* The Buchart parser has been deprecated and replaced by a new
plugin called SUPPLE - the Sheffield University Prolog Parser
for Language Engineering. Full details, including information
on how to move your application from Buchart to SUPPLE, is in
section 9.12.
* The Hepple POS Tagger is now open-source. The source code has
been included in the GATE distribution, under
src/hepple/postag. More information about the POS Tagger can be
found in Section 8.4.
* Minipar is now supported on Windows. minipar-windows.exe, a
modified version of pdemo.cpp is added under the
gate/plugins/minipar directory to allow users to run Minipar on
windows platform. While using Minipar on Windows, this binary
should be provided as a value for miniparBinary parameter. For
full information on Minipar in GATE, see section 9.10 of the
User Guide.
* The XmlGateFormat writer(Save As Xml from GATE GUI,
gate.Document.toXml() from GATE API) and reader have been
modified to write and read GATE annotation IDs. For backward
compatibility reasons the old reader has been kept. This change
fixes a bug which manifested in the following situation: If a
GATE document had annotations carrying features of which values
were numbers representing other GATE annotation IDs, after a
save and a reload of the document to and from XML, the former
values of the features could have become invalid by pointing to
other annotations. By saving and restoring the GATE annotation
ID, the former consistency of the GATE document is maintained.
For more information, see Section 6.5.2 of the User Guide.
* The NP chunker and chemistry tagger plugins have been updated.
Mark Greenwood has relicenced them under the LGPL, so their
source code has been moved into the GATE distribution. See
sections 9.3 and 9.15 for details.
* The Tree Tagger wrapper has been updated with an option to be
less strict when characters that cannot be represented in the
tagger's encoding are encountered in the document. Details are
in section 9.7.
* JAPE Transducers can be serialized into binary files. The
option to load serialized version of JAPE Transducer (an
init-time parameter binaryGrammarURL) is also implemented which
can be used as an alternative to the parameter grammarURL. More
information can be found in Section 7.7.
* On Mac OS, GATE now behaves more naturally. The application
menu items and keyboard shortcuts for About and Preferences now
do what you would expect, and exiting GATE with command-Q or
the Quit menu item properly saves your options and current
session.
* Updated versions of Weka (3.4.6) and Maxent (2.4.0).
* Optimisation in gate.creole.ml: the conversion of AnnotationSet
into ML examples is now faster.
* It is now possible to create your own implementation of
Annotation, and have GATE use this instead of the default
implementation. See AnnotationFactory and AnnotationSetImpl in
the gate.annotation package for details.
3. Bug fixes
* The Tree Tagger wrapper has been updated in order to run under
Windows. See 9.7.
* The SUPPLE parser has been made more user-friendly. It now
produces more helpful error messages if things go wrong. Note
that you will need to update any saved applications that
include SUPPLE to work with this version - see section 9.12 of
the User Guide for details.
* Miscellaneous fixes in the Ontotext JapeC compiler.
* Optimization : the creation of a Document is much faster.
* Google plugin: The optional pagesToExclude parameter was
causing a NullPointerException when left empty at run
time. Full details about the plugin functionality can be found
in section 9.20.
* Minipar, SUPPLE, TreeTagger: These plugins that call external
processes have been fixed to cope better with path names that
contain spaces. Note that some of the external tools
themselves still have problems handling spaces in file names,
but these are beyond our control to fix. If you want to use
any of these plugins, be sure to read the documentation to see
if they have any such restrictions.
* When using a non-default location for GATE configuration files,
the configuration data is saved back to the correct location
when GATE exits. Previously the default locations were always
used.
* Jape Debugger: ConcurrentModificationException in JAPE
debugger. The JAPE debugger was generating a
ConcurrentModificationException during an attempt to run
ANNIE. There is no exception when running without the debugger
enabled. As result of fixing one unnesesary and incorrect
callback to debugger was removed from SinglePhaseTransducer
class.
* Plus many other small bugfixes...
-------------------------------------------------------------------------
Message diffusé par la liste Langage Naturel <LN at cines.fr>
Informations, abonnement : http://www.atala.org/article.php3?id_article=48
English version :
Archives : http://listes.cines.fr/wws/arc/ln
http://listserv.linguistlist.org/archives/ln.html
La liste LN est parrainée par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhésion : http://www.atala.org/
-------------------------------------------------------------------------
More information about the Ln
mailing list