[Corpora-List] [software] JWPL - Java Wikipedia Library

Torsten Zesch zesch at tk.informatik.tu-darmstadt.de
Tue Apr 29 14:29:04 UTC 2008


JWPL - Java Wikipedia Library

version 0.44 beta is now available at

http://www.ukp.tu-darmstadt.de/software/JWPL

JWPL now contains a Mediawiki markup parser that can be used to 
analyze the contents of a Wikipedia page. The parser can 
also be used stand-alone to analyze further web pages using
MediaWiki markup.


INTRODUCTION

Lately, Wikipedia has been recognized as a promising lexical
semantic resource. We present JWPL, a free Java-based Wikipedia
application programming interface that enables the use of
Wikipedia as a NLP resource by providing efficient programmatic
access to the knowledge therein.


FUNCTIONALITY

Efficient access to:
* article text
* categories
* redirects
* links between articles (ingoing and outgoing)
* sections, paragraphs, link context, language links, etc.

Discrimination between
* article pages
* disambiguation pages
* redirect pages.


DOWNLOAD

JWPL is free for non-profit and non-commercial use.

http://www.ukp.tu-darmstadt.de/software/JWPL


MAIN IMPROVEMENTS (since JWPL v0.3)

* JWPL now contains a Mediawiki markup parser that can be used to 
  analyze the contents of a Wikipedia page. The parser can
  also be used stand-alone to analyze further web pages using MediaWiki
  markup.
* The performance of iterating through pages and categories has been 
  significantly improved (>10 times faster than before).
* Added support for all currently used Wikipedia languages. Data files
  will be available soon.
* The method getDescendants() now returns an buffered Iterable instead 
  of a Set. This significantly decreases the memory usage.


ABOUT

JWPL was developed by the Ubiquitous Knowledge Processing Lab
at Technische Universitaet Darmstadt.

http://www.ukp.tu-darmstadt.de


REFERENCE

Zesch, T.; Müller, C. & Gurevych, I. Extracting Lexical Semantic Knowledge
  from Wikipedia and Wiktionary. In Proceedings of the Conference on
  Language Resources and Evaluation (LREC), 2008


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list