[Corpora-List] [NLP2RDF] Announcement: NLP Interchange Format (NIF)
hellmann at informatik.uni-leipzig.de
hellmann at informatik.uni-leipzig.de
Sat Dec 17 13:30:49 UTC 2011
Dear John, dear lists,
There has been a detailed discussion on the ontolog forum.
I have read most of these emails, but the discussion became quite long
and might bore any outsider. Here is an overview:
http://ontolog.cim3.net/forum/ontolog-forum/2011-12/threads.html
@John:
Sadly, it seems that there are no concrete improvements we can make,
based on the comments you made. You did not provide a feasible
alternative. Your suggestions seem to require years to be put into
practice.
Especially, your basic assumption that NIF was created for computer
linguists is slightly off. NIF was originally developed for LOD2 (
http://lod2.eu ) and the Semantic Web community. We have undergone an
initial community review process and consulted personally with over 40
people. This is also the reason, why NIF is firmly built on W3C
standards such as URIs, RDF, OWL and others. Now after the 1.0
release we are hoping for feedback from a broader audience and invite
anybody to give feedback. We are, however, unable to leave this
consistent framework, because we would loose one major feature of NIF:
interoperability with Linked Data and the RDF world. The basic goal of
NIF is to unlock NLP for the RDF world and connect the Web of Data,
the Web of Documents and NLP.
Although, my colleagues and I are also big fans of JSON (btw. as
others also mentioned, RDF and JSON is not a contradiction), it just
won't happen that we will define our own semantics outside of RDF and
OWL, because we would loose more than gain. Personally, I also think,
that the time of not reusable island solutions should come to an end
and we should start to build on open standards as we have done with NIF.
I was actually wondering why you got stuck on the serialization of
NIF. It is one of the tiniest and uninteresting aspects, in my
opinion. There is a plethora of open parser and serializer
implementations available, so developers are relieved of a lot of
boilerplate. Many other aspects of NIF are more interesting and could
even be reused outside the RDF world. The design of the URIs for
example is quite universal. Also the mappings and the knowledge
contained in the provided ontologies can be converted easily to other
formats.
Of course, we are aware that the provided flexibility and reusability
adds some performance overhead. Personally, I would recommend to use
technologies like UIMA and RDBMS for performance critical tasks. One
use case of NIF is to be able to easily replace one web service (e.g.
Zemanta or OpenCalais) with another one as the interface is
standardized. Note that these services are available as web services
already, so the extra parsing overhead might be neglectable. It would
be interesting to have some measurement if the parsing speed has any
relevance compared to network speed and latency in the real world.
We are not aware of any other data model, which we could have used.
Topic Maps might have been an option, but the tool support is not as
rich. JSON and XML are not really data models, I would rather count
them as serialization formats for data models.
Anyhow, I will have a look at the whole Ontolog discussion again and
see if there is something concrete, we can exploit. I think the main
difficulty for the adoption of RDF was that people tried to use it for
tasks that it is not suited for (e.g. replacing relational databases).
For linking and data integration, however, it seems to be working
quite well, hence we used it for NIF.
Annotations are a form of linking, right?
All the best,
Sebastian
Quoting "John F. Sowa" <sowa at bestweb.net>:
> Dear Michael and Jens,
>
> JFS
>>> I sent a note to Ontolog Forum (copy below), which addresses
>>> many of the points raised in this thread.
>
> MB
>> Which would have been a better place for you to start the thread.
>
> The talk by R. V. Guha, who was the original designer of RDF, was
> sponsored by Ontolog Forum last week. It started a thread on that
> list. When NLP2RDF was announced on Corpora List, I thought it was
> appropriate to alert the developers and potential users of NIF about
> that talk and its implications.
>
> MB
>> schema.org is part of RDF: http://schema.org/docs/datamodel.html
>>
>> "The data model used is very generic and derived from RDF Schema"
>
> That quotation is taken out of context. See the full statement:
>
> schema.org
>> The data model used is very generic and derived from RDF Schema.
>> (which in turn was derived from CycL, which in turn ...).
>
> CycL is the very rich logic of the Cyc system, which Guha had helped
> design and implement while he was an associate director of Cyc. The
> three dots refer to the many developments in AI, logic, comp. sci.,
> linguistics, and NLP that influenced Cyc. In designing RDF, Guha
> tried to design a very limited, simple notation based on just binary
> relations (which C. S. Peirce introduced in 1870). He hoped that
> could be a starting point, which would evolve into the much richer
> logic that was necessary for AI, NLP, comp. sci., and linguistics.
>
> But as he said in his talk, "Somehow RDF never caught on." He did not
> mean that nobody uses it, but that it failed to achieve the widespread
> use that the W3C had hoped for. In response to a question about using
> LISP (which I asked), Guha said "I wish we could have done that."
>
> Most of the other people who had any experience in AI also wished
> that they could have used LISP. That includes Ora Lassila, who wrote
> a proposal in 1997 for a LISP-like version, and Pat Hayes, who defined
> the LBase semantics with Guha. Pat was also a coauthor of another web
> page you cited: http://www.w3.org/TR/rdf-mt/ Hayes & Menzel extended
> LBase for the semantics of ISO standard 24707 for Common Logic (CL).
>
> MS
>> Nobody said that RDF is bound to RDFs and OWL/DL. If you think that
>> many people would sacrifice decidability and low computational
>> complexity for more expressional power, just define your own semantic
>> extension. You can have unrestricted first order logic - LBase
>> is just that.
>
> The WHERE-clause of SQL has the full expressive power of first-order
> logic for expressing queries and constraints. And that version of logic
> runs the world economy. One of the major reasons why "RDF never caught
> on" for commercial web sites is that nearly all of them are built around
> a relational database. The limited expressive power of RDF and OWL is
> one of the major deterrents to using it for commercial web sites.
>
> As for NLP, every major notation for syntax or semantics requires at
> least full FOL for its definition and/or for interchanging the results
> of analyzing and interpreting NL sentences. If you have any questions
> about decidability, I recommend the following article:
>
> http://www.jfsowa.com/pubs/fflogic.pdf
> Fads and Fallacies about Logic
>
> JL
>> RDFa is used for embedding RDF in HTML pages. Hence, it is quite obvious
>> that it is a better choice for schema.org than other RDF syntaxes. There
>> are, of course, other scenarios in which you just want to exchange
>> information (without HTML), in which one of the other RDF serialisations
>> is more appropriate.
>
> After schema.org was introduced, the RDF community responded with its
> own web site that recommended ways of using RDF in conjunction with it.
> See http://schema.rdfs.org
>
> The first page of that web site presents a serialization of the
> hierarchy of terms and definitions from schema.org. It has links
> to five different representations: JSON (which Google and other
> participants in schema.org recommend), CSV (Comma Separated Values),
> and three serializations for RDF: RDF/Turtle, RDF/XML, and RDF/N3.
>
> Before making a firm commitment to any notation as a standard for NLP,
> I suggest that you poll computational linguists and ask them what they
> would prefer for their work. Among the questions you could ask is to
> look at those five serializations and check which one(s) they prefer.
>
> Corpora List is a good place to start such a poll.
>
> John
> _______________________________________________
> NLP2RDF mailing list
> NLP2RDF at lists.informatik.uni-leipzig.de
> http://lists.informatik.uni-leipzig.de/mailman/listinfo/nlp2rdf
>
>
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list