[Corpora-List] UIMA and NIF, was Re: [NLP2RDF] Announcement: NLP Interchange Format(NIF)
Sebastian Hellmann
hellmann at informatik.uni-leipzig.de
Sat Dec 10 18:00:42 UTC 2011
Hello Siddhartha,
I am assuming that you are asking this question, because we all know that there is no such thing as a golden hammer. Neither RDBMS nor UIMA nor RDF nor NIF are able to be the right tool for every job.
There is a prototype for UIMA, which serialises POS tags as NIF. We are working on a generic output, also. (you are welcome to join) The model should be compatible, just give each annotated entity a NIF uri.
The link to the implementation can be found on the web demo or on the involved people page.
If you encounter any problems please tell. NIF and UIMA should play well together, there is just no mature implementation yet.
My idea to solve your question would be to serialize the relevant part of the current UIMA memory state as NIF, then send it to an OWL reasoner and then query for the inferences you want and add them to UIMA mem model again and continue.
Of course, it would be better to implement inference directly in UIMA. Maybe something already exists?
One of the design choices behind using OWL for NIF was the availability of mature (open-source) tools, so dozens of implementations and ontologies as background knowledge can be (re-) used.
The best way in my opinion would be to feed the ontologies used in NIF (e.g. OLiA or NERD) into the UIMA type system and then do reasoning directly in uima., somehow.
I am currently travelling and only have my mobile with me, sorry if answers are not so speedy. It will improve in a week,
Sebastian
--
Sent with my mobile phone, please excuse my brevity, Sebastian
Siddhartha Jonnalagadda <sid.kgp at gmail.com> wrote:
Hey Rich,
RDBMS is an industry standard that works well for some things such as storing the extracted metadata, but might not be optimal for performing reasoning over it. That might be one reason some people use other representations such as RDF/SPARQL for higher-level tasks. In general, storing everything in the Common Analysis Structure defined UIMA's type system works for me and where needed I could write them into a Database. What is the optimal way to represent the metadata for reasoning tasks? How could I transfer my UIMA CAS into that "thing"?
Sincerely,
Siddhartha Jonnalagadda, Ph.D.
sjonnalagadda.wordpress.com
On Fri, Dec 9, 2011 at 11:56 AM, Rich Cooper <rich at englishlogickernel.com> wrote:
Dear Siddhartha,
Could you please provide more detail about what you need in the way of “more computer-interpretable than RDBMS”? I use the RDBMS columns with unstructured text, analyze the text in software, and populate new columns to store the analyzed NLP information. By iteratively aggregating RDBMS columns, I am able to process NLP quite well using the RDBMS capabilities plus software functionality for interpretation.
More information would be useful,
-Rich
Sincerely,
Rich Cooper
EnglishLogicKernel.com
Rich AT EnglishLogicKernel DOT com
9 4 9 \ 5 2 5 - 5 7 1 2
_____________________________________________
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of Siddhartha Jonnalagadda
Sent: Friday, December 09, 2011 9:07 AM
To: nlp2rdf at lists.informatik.uni-leipzig.de; CORPORA List
Cc: Jens Lehmann
Subject: Re: [Corpora-List] [NLP2RDF] Announcement: NLP Interchange Format(NIF)
Somewhat related issue:
Since UIMA is seeing an increasing use within NLP community (both Information Extraction and others such as Question/Answering), I wonder why another standard as opposed to an interface between the UIMA type system and one of the many existing standards. In other words, is there some work on representing the information we extract in a way more computer-interpretable than RDBMS?
Sincerely,
Siddhartha Jonnalagadda, Ph.D.
sjonnalagadda.wordpress.com
On Fri, Dec 9, 2011 at 10:39 AM, John F. Sowa <sowa at bestweb.net> wrote:
Before making a firm commitment to any notation as a standard for NLP,
I suggest that you poll computational linguists and ask them what they
would prefer for their work. Among the questions you could ask is to
look at those five serializations and check which one(s) they prefer.
Corpora List is a good place to start such a poll.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20111210/45e80c7e/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list