[Corpora-List] WordNet vs Ontology

Wed Aug 13 09:25:15 UTC 2014

Dear Sebastian,

First, let me say that throughout my work, we've used these
dictionary-and-ontology things (let's call them lexicalized ontologies),
sometimes more than one of them, and it's easy to see why
- a common representation (in terms of a data model such as triples)
- a common name space (such as URI senses)
..makes a lot of sense.
In that sense, Cyc/CycL and microtheories, recent work to provide a uniform
wrapper around multiple wordnets (UBY-LMF) or RDF representations
of wordnets are all instances of one good idea which, if/when it fully
works, would get us a huge step farther.
"If it works", in this case, means:
- certain kinds of inference (such as: is this element linked to that
element via multiple ISA steps, or, what is/are the least common
ancestor(s)) can be supported very effectively
  by in-memory data structures with the right kind of indexing. It's very
hard to do with an SQL database (because their data model is far more
general) but might be efficiently
  supported by an RDF database that understands the subclass/superclass
hierarchy. (With the caveat that Wordnet's hyponymy/hypernymy is not
modeled as subclass/superclass
  for reasons that make sense ontologically, but may be frustrating in
practice)

In the RDF world, we believe in URI senses. I agree, that RDF does not add
> anything to the content of the original data. However, I would argue, that
> it restructures the resource and makes the modeling explicit, transparent
> and re-usable. Also discoverability of data is increased.
>
I fully agree that RDF's way of adding namespaces through URIs is the way
to go. And TurtleRDF's way of declaring prefixes
means that you can use URIs with a lot less verbosity, i.e.
wn:n0001234 lemon:broader wn:n0002345
intead of having the full URIs in there.

RDF is a Framework to Describe Resources such as WordNet. At the end of the
> day, it should be easier to answer the question, which part of WordNet is
> on ontology (and can be used as such) and what part is merely a dictionary.
>
Seeing the full lemon spec, it's not always clear to me where the ontology
stops and "the lemon creators shoehorning things such as syntax trees into
an RDF notation" starts.
In a sense, this is an artefact of lemon's creators both trying to capture
commonsensical constructs that everyone will use and doing new exciting
things at the same time.
I find Miller's original intuition of "let's add an ontological component
to a dictionary so you can look up things by meaning" more useful here if
we want an intuition
on WordNet. Other resources (such as VerbNet or FrameNet) share the idea of
having a dictionary joined with some conceptual data model, but have a
focus on information
that is closer to linguistic properties than the sense relations of
wordnets.

I believe, that essentially my community is trying to understand, what is
> currently going on and then model this in OWL, which is similar to UML or
> ER-diagrams.
>
... which is, again, commonsensical. If you want to model one resource in
another framework, having a meta-model of the data helps. (meta-model
sounds weird here because
usually the model is some SQL or class diagram, and the meta-model is the
one that describes the structure of UML or ER models, again in a
description formalism that could
be translated into UML or ER.)

> Once you have it in OWL, you can mix and merge and transform it into more
> efficient structures like SQL (As John mentioned, Cyc is also providing
> these mappings and it is easy to go down from rich knowledge). The quest
> therefore is to encode human-knowledge into the data on a meta-level, i.e.
> describing the data/resources not the world.
>
I fully agree. (And, just to be extra-cautious, will point to the
difference between  meta-data and meta-models for linguistic models).

We are in dire need of expert input, however. Hence my attempt to
> cross-post.
>
For my two cents: I think that many of the practical problems we're facing
(as in: after we've got our data in RDF, can we query the data efficiently
and do we get the answers that we'd intuitively want), the general problems
in knowledge-based AI have been around since ages, but now we have
specific instances of the problem that can tell us more about what kind of
solution can be workable in practice, and where we'll certainly see a
pattern
of early adopters (e.g. the WSD community, because having  all your data in
one place is great and you need some kind of graph but no reasoning in
a more extensive sense) and late adopters (e.g. people who make extensive
use of relatedness measures or techniques that are both complicated and
performance-sensitive enough that it makes little sense to include them in
a generic database).

Best wishes,
Yannick

>
> All the best,
> Sebastian
>
>
>
>
> On 08.08.2014 09:52, yversley at gmail.com wrote:
>
>  Dear Sebastian,
>
>  let me start out by saying that including that I’m not sure if
> broadening an already diffuse discussion by adding more people to it is
> helpful in the sense of achieving a better signal-to-noise ratio.
> Corpora-List is (in)famous for occasionally having discussions between
> people with very different background assumptions (e.g. Ramesh’s insistence
> that language is best seen as behaviour vs. the point that language is a
> tool to get meanings across). This can be both good and bad, and lots of
> people who are only interested in factual information did or will hit
> the “Mute thread" button (or moral equivalent) in the process.
>
>  Your whole post seems to boil down to a claim that only RDF-encoded data
> should count as ontology. This seems to be a bit near-sighted to me, as
> LemonRDF’s encoding of WordNet is just that, an encoding which is very
> convenient but which adds nothing to the existing semantics.
>
>  I completely agree that using a powerful database (be it RDF or SQL or
> anything else) is better than using the 90s infrastructure that was once
> designed for Wordnet, and that linking datasets together is much easier
> with a common format thst reduces the m:n problem to an 1:n problem.
>
>  We already established earlier that WordNet is a combination between a
> dictionary and an ontological component, which is exactly why it’s more
> useful for NLP than the ontologies that were part of the original
> conception of the Semantic Web. Fortunately for us though, people woke up
> to that idea and resources such as DBPedia now also include dictionary
> entries that mediate between natural-language strings and the concepts of
> the respective ontology.
>
>  Saying that some people think that “the ontology is already in the text"
> is unnecessarily putting up a strawman. No one claimed this, and you’d do
> better by understanding the actual arguments put forward - for example,
> that in the absence of a central authority, as with marriage or taxonomies
> in Biology, ontologies are conceptualizations that are intersubjective
> rather than purely objective. E.g. Kafka may be a German writer in one
> ontology and a Czech writer in another, yet either of these ontologies
> would be useful and intuitively plausible. (This creates a
> tension/incompatibility between the perspective that ontologies are logical
> things and that you should be able to reason with them, and the view that
> you should be able to freely combine ontologies on related things.)
>
>  Your discussion of layers is absolutely orthogonal to that - modeling
> text, annotations, metadata, and ontology in one database is surely
> convenient if you can make it work in a sense that's practically relevant
> but it doesn’t add anything to the discussion we’re having here.
>
>  Best wishes,
> Yannick
>
>  *Von:* Sebastian Hellmann <hellmann at informatik.uni-leipzig.de>
> *Gesendet:* ‎Freitag‎, ‎8‎. ‎August‎ ‎2014 ‎09‎:‎35
> *An:* John F Sowa <sowa at bestweb.net>, corpora <corpora at uib.no>, A list
> for those interested in open data in linguistics.
> <open-linguistics at lists.okfn.org>, nlp2rdf
> <nlp2rdf at lists.informatik.uni-leipzig.de>
>
>  Dear all,
> (I included some more lists to ping them, discussion started here:
> http://mailman.uib.no/public/corpora/2014-August/020939.html)
>
> I see that there are many viewpoints on this issue in this thread.
> So let me add my personal biased view.
>
> In the broadest sense, we start to create an ontology by stating facts:
>
> married (a, b) .
>
> Imho we have an ontology, solely for the reason, that we start to relate a
> to b with "married" . Even if there is not an explicit ontology defining
> "married", it is still used in an "ontological" way, just not explicit.
> There are other aspects missing, which have been discussed throughout the
> literature (i.e. the fact that it must be "shared" by Gruber), but in the
> broadest sense, it qualifies.
>
> Regarding language technology and this discussion, I would say that we
> should be careful not to mix levels. This is done by lexical-semantic
> resources, i.e. WordNet, but we could separate it again.
>
> In my view, we have these different layers:
>
> 1. the content, i.e. the characters (html, plaintext), e.g in  unicode.
> 2. the container of the content, i.e. document or tweet
> 3. annotations on the content
> 4. metadata on the container, e.g. the tweeter or author for context
> 5. collection of content (with or without annotations) i.e. the corpora
> 6. ontologies and data describing language, i.e. lexica, dictionaries,
> terminologies, etc. such as WordNet
> 7. factual databases inluding their taxonomies, i.e. the DBpedia knowledge
> graph http://dbpedia.org
>
> (@John: I hope you are noticing, that I am trying to be keep all of it as
> underspecified as possible)
>
> Then in addition, there are ontologies on a meta-level that try to capture
> all seven layers. Some examples (more below): NIF, lemon, ITS, NERD [1]
> which we are trying to combine in the http://nlp2rdf.org and
> http://lider-project.eu
>
> We can model WordNet using the lemon ontology:
> http://datahub.io/dataset/lemonwordnet
> However for certain purposes, it makes sense to transform WordNet to
> become a taxonomy as YAGO is doing:
>
> https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/
>
> I am not fixed upon any of the definitions I gave above, as I am aware
> that you can and should! transform one in the other (with some effort, e.g.
> corpora to dictionary, fact extraction, language generation).
>
> If we are talking about extracting ontologies from text, there might be
> philosophical people who might want to argue that the ontology is already
> in the text. Discussion can be endless, if you take the wrong linguistic
> turn.
>
> If we are focusing on engineering of information machines, then things are
> much clearer.
>
> All the best,
> Sebastian
>
>
>
> [1] related to the different layers:
> 1. NIF: http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#
> 2. (there is a gap here, Dublin Core or Foaf are not enough imho)
> 3 a) MARL: http://www.gi2mo.org/marl/0.1/ns.html
>    b) ITS: Docu: http://www.w3.org/TR/its20/ , RDF:
> http://www.w3.org/2005/11/its/rdf#
>    c) OLIA: http://purl.org/olia/
> 4. a) Dublin Core: http://dublincore.org/documents/dcmi-terms/
>     b) Prov-O: http://www.w3.org/TR/prov-o/
> 5. also NIF:
> http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#
> 6. lemon: http://lemon-model.net/
> 7. a) DCAT and DataId: http://wiki.dbpedia.org/coop/DataIDUnit
>    b) NERD: http://nerd.eurecom.fr/ontology
>
>
>
> On 08.08.2014 06:11, John F Sowa wrote:
>
> On 8/7/2014 10:57 PM, Ken Litkowski wrote:
>
> It would seem to me that our goal should be a classification
> of all existing things (not to exclude the narrower types).
>
>
> Yes, but note the slides I suggested in my first note:
>
>    http://www.jfsowa.com/talks/kdptut.pdf
>
> Slides 7 to 9:  Cyc project.  30 years of work (since 1984).
> After the first 25 years, 100 million dollars and 1000 person-years
> of work (one person-millennium!), 600,000 concepts, defined by
> 5,000,000 axioms, organized in 6,000 microtheories -- and counting.
>
> Slide 10:  2300 years of universal ontology schemes -- and counting.
>
> The Brandeis Shallow Ontology attempts to do this, and incidentally
> is being used to characterize arguments of verbs in Patrick Hanks
> corpus pattern analysis, i.e., in the imperfect world of language.
>
>
> I strongly believe in shallow, underspecified ontologies -- especially
> when they're supplemented with lots of lexical information about verbs
> and their characteristic patterns.
>
> But I also believe that the key to having an open-ended variety of
> specialized ontologies is to make the computers do what people do:
> extend their ontologies automatically by reading books.
>
> Lenat made the mistake of assuming that you need to hand-code
> a huge amount of knowledge before a system can start to read
> by itself.  But that's wrong.  You need to design a system that
> can automatically augment its ontology every step of the way.
>
> John
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
>
> --
> Sebastian Hellmann
> AKSW/NLP2RDF research group
> Insitute for Applied Informatics (InfAI) and DBpedia Association
> Events:
> * *Sept. 1-5, 2014* Conference Week in Leipzig, including
> ** *Sept 2nd*, MLODE 2014 <http://mlode2014.nlp2rdf.org/>
> ** *Sept 3rd*, 2nd DBpedia Community Meeting
> <http://wiki.dbpedia.org/meetings/Leipzig2014>
> ** *Sept 4th-5th*, SEMANTiCS (formerly i-SEMANTICS) <http://semantics.cc/>
> Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
> Projects: http://dbpedia.org, http://nlp2rdf.org,
> http://linguistics.okfn.org, https://www.w3.org/community/ld4lt
> <http://www.w3.org/community/ld4lt>
> Homepage: http://aksw.org/SebastianHellmann
> Research Group: http://aksw.org
> Thesis:
> http://tinyurl.com/sh-thesis-summary
> http://tinyurl.com/sh-thesis
>
>
>
> --
> Sebastian Hellmann
> AKSW/NLP2RDF research group
> Insitute for Applied Informatics (InfAI) and DBpedia Association
> Events:
> * *Sept. 1-5, 2014* Conference Week in Leipzig, including
> ** *Sept 2nd*, MLODE 2014 <http://mlode2014.nlp2rdf.org/>
> ** *Sept 3rd*, 2nd DBpedia Community Meeting
> <http://wiki.dbpedia.org/meetings/Leipzig2014>
> ** *Sept 4th-5th*, SEMANTiCS (formerly i-SEMANTICS) <http://semantics.cc/>
> Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
> Projects: http://dbpedia.org, http://nlp2rdf.org,
> http://linguistics.okfn.org, https://www.w3.org/community/ld4lt
> <http://www.w3.org/community/ld4lt>
> Homepage: http://aksw.org/SebastianHellmann
> Research Group: http://aksw.org
> Thesis:
> http://tinyurl.com/sh-thesis-summary
> http://tinyurl.com/sh-thesis
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140813/2beaaf78/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora