[Corpora-List] Natural Language Processing for MediaWiki: First major release of the Semantic Assistants Wiki-NLP Integration
René Witte
witte at semanticsoftware.info
Fri Dec 21 14:25:38 UTC 2012
Natural Language Processing for MediaWiki:
First major release of the Semantic Assistants Wiki-NLP Integration
[for an online version including pictures and references, please go to:
http://www.semanticsoftware.info/first-open-source-release-semantic-assistants-wiki-nlp]
We are happy to announce the first major release of our Semantic Assistants
Wiki-NLP integration. This is the first comprehensive open source solution
for bringing Natural Language Processing (NLP) to wiki users, in particular
for wikis based on the well-known MediaWiki engine and its Semantic MediaWiki
(SMW) extension. It can run any NLP pipeline deployed in the General
Architecture for Text Engineering (GATE), brokered as web services through
the Semantic Assistants server. This allows you to bring novel text mining
assistants to wiki users, e.g., for automatically structuring wiki pages,
answering questions in natural language, quality assurance, entity detection,
summarization, among others. The results of the NLP analysis are written back
to the wiki, allowing humans and AI to work collaboratively on wiki content.
Additionally, semantic markup understood by the SMW extension can be
automatically generated from NLP output, providing semantic search and query
functionalities.
Features
=======
The current release includes the following features:
o Light-weight MediaWiki Extension:
The Wiki-NLP integration is introduced to an existing MediaWiki engine through
installing a light-weight extension. Without requiring modifications on the
wiki engine, the extension adds a link to the wiki toolbox menu through which
users can load the Wiki-NLP interface. Using this interface, users can then
inquire about and invoke NLP services through the dynamically generated
Wiki-NLP interface within the wiki environment. Therefore, no context
switching is needed by the wiki users in order to use the NLP services.
o NLP Pipeline Independent Architecture:
The Wiki-NLP integration is backed by the Semantic Assistants server, which
provides a service-oriented solution to offer NLP capabilities in a wiki
system. Therefore, any NLP service available in a given Semantic Assistants
server can be invoked through the Wiki-NLP integration on a wiki's content.
o Flexible Wiki Input Handling:
At times, a user's information need is scattered across multiple pages in the
wiki. To address this problem, our Wiki-NLP integration allows wiki users to
collect one or multiple pages of the wiki in a so-called "collection" and run
an NLP service on the collected pages at once. This feature allows
batch-processing of wiki pages, as well as gathering multiple input pages for
pipelines analyzing multi-documents.
o Flexible NLP Result Handling:
The Wiki-NLP integration is also flexible in terms of where the NLP pipelines'
output can be written. Upon a user's request, the pipeline results can be
appended to an existing page body or its associated discussion page, create a
new page, as well as writing to a wiki page in an external wiki, provided
that it is supported by the Wiki-NLP integration architecture. Based on the
type of results generated by the NLP pipeline, e.g., annotations or new
files, the Wiki-NLP integration offers a simple template-based visualization
capability that can be easily customized. Upon each successful NLP service
execution, the Wiki-NLP integration automatically updates the existing
results on the specified wiki page, where applicable.
o Semantic Markup Generation:
Where semantic metadata is generated by an NLP pipeline, the Wiki-NLP
integration takes care of representing it in a formal language using the
Semantic MediaWiki special markup. For generated metadata, the Wiki-NLP
integration enriches the text with its equivalent markup and makes it
permanent in the wiki repository. Therefore, for each generated result, both
a user-friendly and machine-processable representation of the result is made
available in the page. These markups are, in turn, transformed to RDF triples
by the Semantic MediaWiki parsing engine, making them available for querying
purposes as well as externalization to other applications. For example, when
the sentence "Mary won the first prize." is contained in a wiki page and
processed by a Named Entity Detection pipeline, an XML document is generated
by the Semantic Assistants server and returned back to the Wiki-NLP
integration, which indicates "Mary" as an entity of type "Person". This XML
document is then processed by our integration and transformed for Semantic
MediaWiki into a formal representation in the form of markup. In our
example, [[hasType::Person|Mary]] markup is generated and written into the
wiki page. The generated markup can then be queried using Semantic
MediaWiki's inline queries. For example, a simple query like {{#ask:
[[hasType::Person]]}} can be used to retrieve all the entities in wiki
content with the type "Person".
o Wiki-independent Architecture:
The Wiki-NLP integration was developed from the ground up with extensibility
in mind. Although the provided examples show how the Wiki-NLP integration can
be used within a MediaWiki instance, it has an extensible architecture, where
support for other wiki engines can be added to the architecture with a
reasonable amount of effort. Both the Semantic Assistants server and the
Wiki-NLP integration have a semantic-based architecture that allows adding
new services and wiki engines without major modifications of their base code.
Application: NLP Wikis in Use
======================
Our open source Wiki-NLP solution is the result of more than 5 years of
research on the technical and social aspects of combining natural language
processing with collaborative wiki systems. We developed a number of
real-world wiki-based solutions that demonstrate how text mining assistants
can effectively collaborate with humans on wiki content. As part of this
research, we investigated (i) the software engineering aspects of Wiki-NLP
integrations, (ii) the usability for wiki users with different backgrounds,
in particular those unfamiliar with NLP; and (iii) its effectiveness for
helping users to develop and improve wiki content in a number of domains and
tasks. To help you build your own Wiki-NLP solution, we documented a number
of successful Wiki-NLP patterns in our Semantic Assistants Wiki-NLP Showcase.
o In the DurmWiki project, we investigate the application of NLP to wikis in
cultural heritage data management, helping wiki users finding relevant
information through NLP pipelines for automatic index generation,
question-answering, and summarization on wiki content. Our Wiki-NLP approach
allows to transform historical documents into a semantic knowledge base that
can be queried through state-of-the-art semantic technologies.
For biomedical research, IntelliGenWiki is our solution for helping curators
dealing with the large amount of publications in this area. Text mining
assistants can aid humans in deciding which papers to curate (triage task)
and extract entities (database curation task) through biomedical entity
recognition, e.g., for organisms or mutations. Experiments measuring the time
for manual vs. NLP/Wiki supported curation in a real-world project
demonstrate the effectiveness of this idea.
o With ReqWiki, we developed the first semantic open source platform for
collaborative software requirements engineering. Here, semantic assistants
provide users with tools for entity extraction on domain documents and
quality assurance services for improving the content of a software
requirements specification (SRS). User studies confirmed the usability for
software engineers unfamiliar with NLP and its effectiveness for improving
requirements documents.
More Information
=============
For further technical information, please see our Wiki-NLP Integration page:
http://www.semanticsoftware.info/semantic-assistants-wiki-nlp
For a number of application examples, check out our Semantic Assistants
Wiki-NLP Showcase:
http://www.semanticsoftware.info/semantic-assistants-wiki-nlp-showcase
Happy Holidays from the Semantic Software Lab team,
--René
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list