[Corpora-List] [software] JWPL - Java Wikipedia Library
Torsten Zesch
zesch at tk.informatik.tu-darmstadt.de
Tue Apr 29 14:29:04 UTC 2008
JWPL - Java Wikipedia Library
version 0.44 beta is now available at
http://www.ukp.tu-darmstadt.de/software/JWPL
JWPL now contains a Mediawiki markup parser that can be used to
analyze the contents of a Wikipedia page. The parser can
also be used stand-alone to analyze further web pages using
MediaWiki markup.
INTRODUCTION
Lately, Wikipedia has been recognized as a promising lexical
semantic resource. We present JWPL, a free Java-based Wikipedia
application programming interface that enables the use of
Wikipedia as a NLP resource by providing efficient programmatic
access to the knowledge therein.
FUNCTIONALITY
Efficient access to:
* article text
* categories
* redirects
* links between articles (ingoing and outgoing)
* sections, paragraphs, link context, language links, etc.
Discrimination between
* article pages
* disambiguation pages
* redirect pages.
DOWNLOAD
JWPL is free for non-profit and non-commercial use.
http://www.ukp.tu-darmstadt.de/software/JWPL
MAIN IMPROVEMENTS (since JWPL v0.3)
* JWPL now contains a Mediawiki markup parser that can be used to
analyze the contents of a Wikipedia page. The parser can
also be used stand-alone to analyze further web pages using MediaWiki
markup.
* The performance of iterating through pages and categories has been
significantly improved (>10 times faster than before).
* Added support for all currently used Wikipedia languages. Data files
will be available soon.
* The method getDescendants() now returns an buffered Iterable instead
of a Set. This significantly decreases the memory usage.
ABOUT
JWPL was developed by the Ubiquitous Knowledge Processing Lab
at Technische Universitaet Darmstadt.
http://www.ukp.tu-darmstadt.de
REFERENCE
Zesch, T.; Müller, C. & Gurevych, I. Extracting Lexical Semantic Knowledge
from Wikipedia and Wiktionary. In Proceedings of the Conference on
Language Resources and Evaluation (LREC), 2008
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list