[Corpora-List] Re: Looking for Automatic POS Tagging Software - a summary of responses

Lam Yuen Wing, Peter ywlam at kcrc.com
Sat Feb 18 06:30:09 UTC 2006


Dear all,
=20
About six weeks ago, I asked for pointers on user-friendly POS taggers
that run under Windows and are able to tag and subcategorise words, e.g.
to tag adjectives and subcategorise them into predicates, attributes,
superlatives, participles, etc. I am grateful to the following members,
who have spent time writing me valuable advice. The following is a
summary of their responses:=20
=20
Ted Pedersen tpederse at d.umn.edu
Ted suggested trying GATE http://gate.ac.uk/, which includes a POS
tagger, and "is fairly easy to install and use (it is
Java based and runs on Windows, Linux, etc...)".
=20
Alex Fang acfang at cityu.edu.hk
Alex recommended AUTASYS, which runs under Windows. For more
information, please visit
http://www.phon.ucl.ac.uk/home/alex/project/tagging/tagging.htm.
AUTASYS provides subcategorisations and gives a selection of two tag
sets: ICE and LOB. In addition, it has a lemmatisation module. It is
available for academic purposes only, 500 pound sterling one-off payment
for a single-user licence or 1,000 pounds for a site licence of one
year. AUTASYS tags 1.8 million words per minute, with estimated accuracy
of 95%. Output results can be in horizontal (passage style) or vertical
format.

Neil Millar kansaineil at hotmail.com <mailto:kansaineil at hotmail.com>=20
Neil suggested giving a try of Brill's Tagger for free at
http://www.cs.jhu.edu/~brill/RBT1_14.tar.Z. The tagger runs on Windows
and is "easy to use".
=20
Eric Atwell eric at comp.leeds.ac.uk <mailto:eric at comp.leeds.ac.uk>=20
Eric said the CLAWS system can be used via WWW by accessing the UCREL
website <http://www.comp.lancs.ac.uk/computing/research/ucrel/claws/>
http://www.comp.lancs.ac.uk/computing/research/ucrel/claws/, which means
it does not necessarily run on UNIX.
There is a free trial service offering access to the latest version of
the tagger, CLAWS4:
http://www.comp.lancs.ac.uk/computing/research/ucrel/claws/trial.html
=20
Paul Rayson rayson at exchange.lancs.ac.uk
<mailto:rayson at exchange.lancs.ac.uk>=20
Paul advised there are beta versions of CLAWS for Windows, Linux and
shortly for MacOSX. Trials could be available on request.
=20
Oliver Mason o.mason at bham.ac.uk <mailto:o.mason at bham.ac.uk>=20
Oliver suggested a try of Qtag
(http://www.english.bham.ac.uk/staff/omason/software/qtag.html), which
is written in Java and thus runs on Windows.
=20
SVMTool team jgimenez at lsi.upc.edu <mailto:jgimenez at lsi.upc.edu>=20
SVMTool team said that in the TALP Research Center (Barcelona) they have
developed a geberak sequential tagger, and applied it to the problem of
PoS tagging. It may be freely downloaded at:
http://www.lsi.upc.edu/~nlp/SVMTool/.
=20
Models for English, Spanish and Catalan are available. And, given
annotated data, it may be trained for any language, any sequential
tagging problem (PoS tagging, NERC, chunking, etc). The C++ version
exhibits a tagging speed of 10,000 words per second.
=20
Atanas Chanev artanisz at mail.bg <mailto:artanisz at mail.bg>=20
Atanas suggested trying the T'n'T tagger (by Thorsten Brants), which is
freely available through registration with
http://www.coli.uni-saarland.de/~thorsten/tnt/
<http://www.coli.uni-saarland.de/~thorsten/tnt/> . Atanas said: "There
is a version for Windows and it has the most user friendly interface
among the taggers I have used. It is one of the currently most accurate
taggers".
=20
A package of taggers working under Linux can be found on:
http://acopost.sourceforge.net/ (follow the sourceforge link). Most of
the Linux applications should work under cygwin emulator of Linux for
Windows, which is downloadable from internet .
=20
Another tagger is the SVMtool (Jesús Giménez and Lluís
Màrquez). Its accuracy is similar to the
accuracy of T'n'T for small amounts of training data. There are c++ and
Perl versions and Perl can be downloaded for free from
www.activestate.com.
=20
Svetlana Sheremetyeva linklana at yahoo.com
Svetlana has her FLAT (Flexible Language Acquisition Tool), which is
"extremely user friendly and can be tuned to any features". Description
of it can be found at http://lanaconsult.com.
=20
Gerard Peregrin GerardPer at aol.com <mailto:GerardPer at aol.com>=20
Gerald recommended to try the software at
http://www-nlp.stanford.edu/software/lex-parser.shtml
<http://www-nlp.stanford.edu/software/lex-parser.shtml> , which is
written in Java.
=20
Vlad Gojol gojol at rnc.ro <mailto:gojol at rnc.ro>=20
Vlad suggested GojolParser, which is "a deep structure morpho-syntactic
analyzer".
=20
Best
Peter Lam
PhD Student
The Hong Kong Polytechnic University



"KCRC - Better connections; better services"

This email and any attachment to it may contain confidential or =
proprietary information that are intended solely for the person / entity =
to whom it was originally addressed.  If you are not the intended =
recipient, any disclosure, copying, distributing or any action taken or =
omitted to be taken in reliance on it, is prohibited and may be =
unlawful.

Internet communications cannot be guaranteed to be secure or error-free =
as information could be intercepted, corrupted, lost, arrive late or =
contain viruses.  The sender therefore does not accept liability for any =
errors or omissions in the context of this message which arise as a =
result of transmission over the Internet.

No opinions contained herein shall be construed as being a formal =
disclosure or commitment of the Kowloon-Canton Railway Corporation =
unless specifically so stated.



More information about the Corpora mailing list