[Corpora-List] Natural Language Processing for MediaWiki: First major release of the Semantic Assistants Wiki-NLP Integration

René Witte witte at semanticsoftware.info
Fri Dec 21 14:25:38 UTC 2012


Natural Language Processing for MediaWiki: 
First major release of the Semantic Assistants Wiki-NLP Integration

[for an online version including pictures and references, please go to: 
http://www.semanticsoftware.info/first-open-source-release-semantic-assistants-wiki-nlp]

We are happy to announce the first major release of our Semantic Assistants 
Wiki-NLP integration. This is the first comprehensive open source solution 
for bringing Natural Language Processing (NLP) to wiki users, in particular 
for wikis based on the well-known MediaWiki engine and its Semantic MediaWiki 
(SMW) extension. It can run any NLP pipeline deployed in the General 
Architecture for Text Engineering (GATE), brokered as web services through 
the Semantic Assistants server. This allows you to bring novel text mining 
assistants to wiki users, e.g., for automatically structuring wiki pages, 
answering questions in natural language, quality assurance, entity detection, 
summarization, among others. The results of the NLP analysis are written back 
to the wiki, allowing humans and AI to work collaboratively on wiki content. 
Additionally, semantic markup understood by the SMW extension can be 
automatically generated from NLP output, providing semantic search and query 
functionalities.


Features
=======
The current release includes the following features:

o Light-weight MediaWiki Extension:
The Wiki-NLP integration is introduced to an existing MediaWiki engine through 
installing a light-weight extension. Without requiring modifications on the 
wiki engine, the extension adds a link to the wiki toolbox menu through which 
users can load the Wiki-NLP interface. Using this interface, users can then 
inquire about and invoke NLP services through the dynamically generated 
Wiki-NLP interface within the wiki environment. Therefore, no context 
switching is needed by the wiki users in order to use the NLP services.

o NLP Pipeline Independent Architecture:
The Wiki-NLP integration is backed by the Semantic Assistants server, which 
provides a service-oriented solution to offer NLP capabilities in a wiki 
system. Therefore, any NLP service available in a given Semantic Assistants 
server can be invoked through the Wiki-NLP integration on a wiki's content.

o Flexible Wiki Input Handling:
At times, a user's information need is scattered across multiple pages in the 
wiki. To address this problem, our Wiki-NLP integration allows wiki users to 
collect one or multiple pages of the wiki in a so-called "collection" and run 
an NLP service on the collected pages at once. This feature allows 
batch-processing of wiki pages, as well as gathering multiple input pages for 
pipelines analyzing multi-documents.

o Flexible NLP Result Handling: 
The Wiki-NLP integration is also flexible in terms of where the NLP pipelines' 
output can be written. Upon a user's request, the pipeline results can be 
appended to an existing page body or its associated discussion page, create a 
new page, as well as writing to a wiki page in an external wiki, provided 
that it is supported by the Wiki-NLP integration architecture. Based on the 
type of results generated by the NLP pipeline, e.g., annotations or new 
files, the Wiki-NLP integration offers a simple template-based visualization 
capability that can be easily customized. Upon each successful NLP service 
execution, the Wiki-NLP integration automatically updates the existing 
results on the specified wiki page, where applicable.

o Semantic Markup Generation:
Where semantic metadata is generated by an NLP pipeline, the Wiki-NLP 
integration takes care of representing it in a formal language using the 
Semantic MediaWiki special markup. For generated metadata, the Wiki-NLP 
integration enriches the text with its equivalent markup and makes it 
permanent in the wiki repository. Therefore, for each generated result, both 
a user-friendly and machine-processable representation of the result is made 
available in the page. These markups are, in turn, transformed to RDF triples 
by the Semantic MediaWiki parsing engine, making them available for querying 
purposes as well as externalization to other applications. For example, when 
the sentence "Mary won the first prize." is contained in a wiki page and 
processed by a Named Entity Detection pipeline, an XML document is generated 
by the Semantic Assistants server and returned back to the Wiki-NLP 
integration, which indicates "Mary" as an entity of type "Person". This XML 
document is then processed by our integration and transformed for Semantic 
MediaWiki into a formal representation in the form of  markup. In our 
example, [[hasType::Person|Mary]] markup is generated and written into the 
wiki page. The generated markup can then be queried using Semantic 
MediaWiki's inline queries. For example, a simple query like {{#ask: 
[[hasType::Person]]}} can be used to retrieve all the entities in wiki 
content with the type "Person".
 
o Wiki-independent Architecture:
The Wiki-NLP integration was developed from the ground up with extensibility 
in mind. Although the provided examples show how the Wiki-NLP integration can 
be used within a MediaWiki instance, it has an extensible architecture, where 
support for other wiki engines can be added to the architecture with a 
reasonable amount of effort. Both the Semantic Assistants server and the 
Wiki-NLP integration have a semantic-based architecture that allows adding 
new services and wiki engines without major modifications of their base code.


Application: NLP Wikis in Use
======================
Our open source Wiki-NLP solution is the result of more than 5 years of 
research on the technical and social aspects of combining natural language 
processing with collaborative wiki systems. We developed a number of 
real-world wiki-based solutions that demonstrate how text mining assistants 
can effectively collaborate with humans on wiki content. As part of this 
research, we investigated (i) the software engineering aspects of Wiki-NLP 
integrations, (ii) the usability for wiki users with different backgrounds, 
in particular those unfamiliar with NLP; and (iii) its effectiveness for 
helping users to develop and improve wiki content in a number of domains and 
tasks. To help you build your own Wiki-NLP solution, we documented a number 
of successful Wiki-NLP patterns in our Semantic Assistants Wiki-NLP Showcase.

o In the DurmWiki project, we investigate the application of NLP to wikis in 
cultural heritage data management, helping wiki users finding relevant 
information through NLP pipelines for automatic index generation, 
question-answering, and summarization on wiki content. Our Wiki-NLP approach 
allows to transform historical documents into a semantic knowledge base that 
can be queried through state-of-the-art semantic technologies.

For biomedical research, IntelliGenWiki is our solution for helping curators 
dealing with the large amount of publications in this area. Text mining 
assistants can aid humans in deciding which papers to curate (triage task) 
and extract entities (database curation task) through biomedical entity 
recognition, e.g., for organisms or mutations. Experiments measuring the time 
for manual vs. NLP/Wiki supported curation in a real-world project 
demonstrate the effectiveness of this idea. 

o With ReqWiki, we developed the first semantic open source platform for 
collaborative software requirements engineering. Here, semantic assistants 
provide users with tools for entity extraction on domain documents and 
quality assurance services for improving the content of a software 
requirements specification (SRS). User studies confirmed the usability for 
software engineers unfamiliar with NLP and its effectiveness for improving 
requirements documents.


More Information
=============
For further technical information, please see our Wiki-NLP Integration page:
http://www.semanticsoftware.info/semantic-assistants-wiki-nlp

For a number of application examples, check out our Semantic Assistants 
Wiki-NLP Showcase:
http://www.semanticsoftware.info/semantic-assistants-wiki-nlp-showcase


Happy Holidays from the Semantic Software Lab team,

--René

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list