<div dir="ltr">Hi, All,<div><br></div><div>Just a comment. I'm not sure if it's a straw man, a white elephant or a Trojan horse (or even irrelevant), but I hope it would be taken as an attempt to clarify at least a subset of the notions discussed during this thread. I take it as referring to lexical aspects more than anything else, including MWFs. The gist is that, in my probably undereducated view, nearly all polysemy (in the usual sense of 'all the definitions under the same headword') originates in metaphor. Now, this brings up the problem that we don't seem to understand precisely what metaphor is, or perhaps we just don't agree on what it is; but I believe once we have some sort of handle on metaphor and its role in the semantics/pragmatics of language use, many of the Kilgarriff et al. complaints about 'word senses' will be found to be at least partially resolvable. Then again, I'm an optimist.</div><div><br></div><div>Jim</div></div><div class="gmail_extra"><br clear="all"><div>James L. Fidelholtz<br>Posgrado en Ciencias del Lenguaje<br>Instituto de Ciencias Sociales y Humanidades<br>Benemérita Universidad Autónoma de Puebla, MÉXICO</div>

<br><div class="gmail_quote">On Wed, Aug 13, 2014 at 4:25 AM, Yannick Versley <span dir="ltr"><<a href="mailto:versley@cl.uni-heidelberg.de" target="_blank">versley@cl.uni-heidelberg.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Dear Sebastian,<br><div class="gmail_extra"><br></div><div class="gmail_extra">First, let me say that throughout my work, we've used these dictionary-and-ontology things (let's call them lexicalized ontologies),</div>

<div class="gmail_extra">sometimes more than one of them, and it's easy to see why</div><div class="gmail_extra">- a common representation (in terms of a data model such as triples)</div><div class="gmail_extra">- a common name space (such as URI senses)</div>

<div class="gmail_extra">..makes a lot of sense.</div><div class="gmail_extra">In that sense, Cyc/CycL and microtheories, recent work to provide a uniform wrapper around multiple wordnets (UBY-LMF) or RDF representations</div>

<div class="gmail_extra">of wordnets are all instances of one good idea which, if/when it fully works, would get us a huge step farther.</div><div class="gmail_extra">"If it works", in this case, means:</div><div class="gmail_extra">

- certain kinds of inference (such as: is this element linked to that element via multiple ISA steps, or, what is/are the least common ancestor(s)) can be supported very effectively</div><div class="gmail_extra">  by in-memory data structures with the right kind of indexing. It's very hard to do with an SQL database (because their data model is far more general) but might be efficiently</div>

<div class="gmail_extra">  supported by an RDF database that understands the subclass/superclass hierarchy. (With the caveat that Wordnet's hyponymy/hypernymy is not modeled as subclass/superclass</div><div class="gmail_extra">

  for reasons that make sense ontologically, but may be frustrating in practice)</div><div class="gmail_extra"><br><div class="gmail_quote"><span class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div bgcolor="#FFFFFF" text="#000000"><div>In the RDF world, we believe in URI senses. I agree, that RDF does

      not add anything to the content of the original data. However, I

      would argue, that it restructures the resource and makes the

      modeling explicit, transparent and re-usable. Also discoverability

      of data is increased.</div></div></blockquote></span><div>I fully agree that RDF's way of adding namespaces through URIs is the way to go. And TurtleRDF's way of declaring prefixes</div><div>means that you can use URIs with a lot less verbosity, i.e.</div>

<div>wn:n0001234 lemon:broader wn:n0002345</div><div>intead of having the full URIs in there.</div><span class=""><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">

<div> RDF is a Framework to Describe Resources

      such as WordNet. At the end of the day, it should be easier to

      answer the question, which part of WordNet is on ontology (and can

      be used as such) and what part is merely a dictionary.<br></div></div></blockquote></span><div>Seeing the full lemon spec, it's not always clear to me where the ontology stops and "the lemon creators shoehorning things such as syntax trees into an RDF notation" starts.</div>

<div>In a sense, this is an artefact of lemon's creators both trying to capture commonsensical constructs that everyone will use and doing new exciting things at the same time.</div><div>I find Miller's original intuition of "let's add an ontological component to a dictionary so you can look up things by meaning" more useful here if we want an intuition</div>

<div>on WordNet. Other resources (such as VerbNet or FrameNet) share the idea of having a dictionary joined with some conceptual data model, but have a focus on information</div><div>that is closer to linguistic properties than the sense relations of wordnets.</div><span class="">

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><div>

      I believe, that essentially my community is trying to understand,

      what is currently going on and then model this in OWL, which is

      similar to UML or ER-diagrams.</div></div></blockquote></span><div>... which is, again, commonsensical. If you want to model one resource in another framework, having a meta-model of the data helps. (meta-model sounds weird here because</div>

<div>usually the model is some SQL or class diagram, and the meta-model is the one that describes the structure of UML or ER models, again in a description formalism that could</div><div>be translated into UML or ER.)</div><span class="">

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><div> Once you have it in OWL, you can

      mix and merge and transform it into more efficient structures like

      SQL (As John mentioned, Cyc is also providing these mappings and

      it is easy to go down from rich knowledge). The quest therefore is

      to encode human-knowledge into the data on a meta-level, i.e.

      describing the data/resources not the world.</div></div></blockquote></span><div>I fully agree. (And, just to be extra-cautious, will point to the difference between  meta-data and meta-models for linguistic models).</div><span class="">

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><div>

      We are in dire need of expert input, however. Hence my attempt to

      cross-post.<br></div></div></blockquote></span><div>For my two cents: I think that many of the practical problems we're facing (as in: after we've got our data in RDF, can we query the data efficiently</div><div>and do we get the answers that we'd intuitively want), the general problems in knowledge-based AI have been around since ages, but now we have</div>

<div>specific instances of the problem that can tell us more about what kind of solution can be workable in practice, and where we'll certainly see a pattern</div><div>of early adopters (e.g. the WSD community, because having  all your data in one place is great and you need some kind of graph but no reasoning in</div>

<div>a more extensive sense) and late adopters (e.g. people who make extensive use of relatedness measures or techniques that are both complicated and</div><div>performance-sensitive enough that it makes little sense to include them in a generic database).</div>

<div><br></div><div>Best wishes,</div><div>Yannick</div><div><div class="h5"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000"><div>

      <br>

      All the best,<br>

      Sebastian<div><div><br>

      <br>

      <br>

      <br>

      On 08.08.2014 09:52, <a href="mailto:yversley@gmail.com" target="_blank">yversley@gmail.com</a> wrote:<br>

    </div></div></div><div><div>

    <blockquote type="cite">

      <div dir="ltr" style="font-family:'Calibri','Segoe UI','Meiryo','Microsoft YaHei UI','Microsoft JhengHei UI','Malgun Gothic','sans-serif';font-size:12pt">

        <div>Dear Sebastian,</div>

        <div><br>

        </div>

        <div>let me start out by saying that including that I’m not sure

          if broadening an already diffuse discussion by adding more

          people to it is helpful in the sense of achieving a better

          signal-to-noise ratio. Corpora-List is (in)famous for

          occasionally having discussions between people with very

          different background assumptions (e.g. Ramesh’s insistence

          that language is best seen as behaviour vs. the point that

          language is a tool to get meanings across). This can be both

          good and bad, and lots of people who are only interested in

          factual information did or will hit the “Mute thread" button

          (or moral equivalent) in the process.</div>

        <div><br>

        </div>

        <div>Your whole post seems to boil down to a claim that only

          RDF-encoded data should count as ontology. This seems to be a

          bit near-sighted to me, as LemonRDF’s encoding of WordNet is

          just that, an encoding which is very convenient but which adds

          nothing to the existing semantics.</div>

        <div><br>

        </div>

        <div>I completely agree that using a powerful database (be it

          RDF or SQL or anything else) is better than using the 90s

          infrastructure that was once designed for Wordnet, and that

          linking datasets together is much easier with a common format

          thst reduces the m:n problem to an 1:n problem.</div>

        <div><br>

        </div>

        <div>We already established earlier that WordNet is a

          combination between a dictionary and an ontological component,

          which is exactly why it’s more useful for NLP than the

          ontologies that were part of the original conception of the

          Semantic Web. Fortunately for us though, people woke up to

          that idea and resources such as DBPedia now also include

          dictionary entries that mediate between natural-language

          strings and the concepts of the respective ontology.</div>

        <div><br>

        </div>

        <div>Saying that some people think that “the ontology is already

          in the text" is unnecessarily putting up a strawman. No one

          claimed this, and you’d do better by understanding the actual

          arguments put forward - for example, that in the absence of a

          central authority, as with marriage or taxonomies in Biology,

          ontologies are conceptualizations that are intersubjective

          rather than purely objective. E.g. Kafka may be a German

          writer in one ontology and a Czech writer in another, yet

          either of these ontologies would be useful and intuitively

          plausible. (This creates a tension/incompatibility between the

          perspective that ontologies are logical things and that you

          should be able to reason with them, and the view that you

          should be able to freely combine ontologies on related

          things.)</div>

        <div><br>

        </div>

        <div>Your discussion of layers is absolutely orthogonal to that

          - modeling text, annotations, metadata, and ontology in one

          database is surely convenient if you can make it work in a

          sense that's practically relevant but it doesn’t add anything

          to the discussion we’re having here.</div>

        <div><br>

        </div>

        <div>Best wishes,</div>

        <div>Yannick</div>

        <div><br>

        </div>

        <div style="padding-top:5px;border-top-color:rgb(229,229,229);border-top-width:1px;border-top-style:solid">

          <div><font face=" 'Calibri', 'Segoe UI', 'Meiryo', 'Microsoft YaHei

              UI', 'Microsoft JhengHei UI', 'Malgun Gothic',

              'sans-serif'"><b>Von:</b> <a href="mailto:hellmann@informatik.uni-leipzig.de" target="_blank">Sebastian Hellmann</a><br>

              <b>Gesendet:</b> ‎Freitag‎, ‎8‎. ‎August‎ ‎2014 ‎09‎:‎35<br>

              <b>An:</b> <a href="mailto:sowa@bestweb.net" target="_blank">John F

                Sowa</a>, <a href="mailto:corpora@uib.no" target="_blank">corpora</a>,

              <a href="mailto:open-linguistics@lists.okfn.org" target="_blank">A list for those interested in open

                data in linguistics.</a>, <a href="mailto:nlp2rdf@lists.informatik.uni-leipzig.de" target="_blank">nlp2rdf</a></font></div>

        </div>

        <div><br>

        </div>

        <div dir="">

          <div>Dear all,<br>

            (I included some more lists to ping them, discussion started

            here: <a href="http://mailman.uib.no/public/corpora/2014-August/020939.html" target="_blank">http://mailman.uib.no/public/corpora/2014-August/020939.html</a>)<br>

            <br>

            I see that there are many viewpoints on this issue in this

            thread.<br>

            So let me add my personal biased view.<br>

            <br>

            In the broadest sense, we start to create an ontology by

            stating facts:<br>

            <br>

            married (a, b) . <br>

            <br>

            Imho we have an ontology, solely for the reason, that we

            start to relate a to b with "married" . Even if there is not

            an explicit ontology defining "married", it is still used in

            an "ontological" way, just not explicit. There are other

            aspects missing, which have been discussed throughout the

            literature (i.e. the fact that it must be "shared" by

            Gruber), but in the broadest sense, it qualifies. <br>

            <br>

            Regarding language technology and this discussion, I would

            say that we should be careful not to mix levels. This is

            done by lexical-semantic resources, i.e. WordNet, but we

            could separate it again. <br>

            <br>

            In my view, we have these different layers:<br>

            <br>

            1. the content, i.e. the characters (html, plaintext), e.g

            in  unicode.<br>

            2. the container of the content, i.e. document or tweet<br>

            3. annotations on the content<br>

            4. metadata on the container, e.g. the tweeter or author for

            context<br>

            5. collection of content (with or without annotations) i.e.

            the corpora<br>

            6. ontologies and data describing language, i.e. lexica,

            dictionaries, terminologies, etc. such as WordNet<br>

            7. factual databases inluding their taxonomies, i.e. the

            DBpedia knowledge graph <a href="http://dbpedia.org" target="_blank">http://dbpedia.org</a><br>

            <br>

            (@John: I hope you are noticing, that I am trying to be keep

            all of it as underspecified as possible)<br>

            <br>

            Then in addition, there are ontologies on a meta-level that

            try to capture all seven layers. Some examples (more below):

            NIF, lemon, ITS, NERD [1]<br>

            which we are trying to combine in the <a href="http://nlp2rdf.org" target="_blank">http://nlp2rdf.org</a>

            and <a href="http://lider-project.eu" target="_blank">http://lider-project.eu</a>

            <br>

            <br>

            We can model WordNet using the lemon ontology: <a href="http://datahub.io/dataset/lemonwordnet" target="_blank">http://datahub.io/dataset/lemonwordnet</a><br>

            However for certain purposes, it makes sense to transform

            WordNet to become a taxonomy as YAGO is doing:<br>

            <a href="https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/" target="_blank">https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/</a><br>

            <br>

            I am not fixed upon any of the definitions I gave above, as

            I am aware that you can and should! transform one in the

            other (with some effort, e.g. corpora to dictionary, fact

            extraction, language generation).<br>

            <br>

            If we are talking about extracting ontologies from text,

            there might be philosophical people who might want to argue

            that the ontology is already in the text. Discussion can be

            endless, if you take the wrong linguistic turn.<br>

            <br>

            If we are focusing on engineering of information machines,

            then things are much clearer. <br>

            <br>

            All the best, <br>

            Sebastian<br>

            <br>

            <br>

            <br>

            [1] related to the different layers:<br>

            1. NIF: <a href="http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#" target="_blank">http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#</a><br>

            2. (there is a gap here, Dublin Core or Foaf are not enough

            imho)<br>

            3 a) MARL: <a href="http://www.gi2mo.org/marl/0.1/ns.html" target="_blank">http://www.gi2mo.org/marl/0.1/ns.html</a><br>

               b) ITS: Docu: <a href="http://www.w3.org/TR/its20/" target="_blank">http://www.w3.org/TR/its20/</a>

            , RDF: <a href="http://www.w3.org/2005/11/its/rdf#" target="_blank">http://www.w3.org/2005/11/its/rdf#</a><br>

               c) OLIA: <a href="http://purl.org/olia/" target="_blank">http://purl.org/olia/</a><br>

            4. a) Dublin Core: <a href="http://dublincore.org/documents/dcmi-terms/" target="_blank">http://dublincore.org/documents/dcmi-terms/</a><br>

                b) Prov-O: <a href="http://www.w3.org/TR/prov-o/" target="_blank">http://www.w3.org/TR/prov-o/</a><br>

            5. also NIF: <a href="http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#" target="_blank">http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#</a><br>

            6. lemon: <a href="http://lemon-model.net/" target="_blank">http://lemon-model.net/</a><br>

            7. a) DCAT and DataId: <a href="http://wiki.dbpedia.org/coop/DataIDUnit" target="_blank">http://wiki.dbpedia.org/coop/DataIDUnit</a><br>

               b) NERD: <a href="http://nerd.eurecom.fr/ontology" target="_blank">http://nerd.eurecom.fr/ontology</a><br>

            <br>

            <br>

            <br>

            On 08.08.2014 06:11, John F Sowa wrote:<br>

          </div>

          <blockquote style="margin-top:0px;margin-bottom:0px">On 8/7/2014 10:57

            PM, Ken Litkowski wrote: <br>

            <blockquote style="margin-top:0px;margin-bottom:0px">It

              would seem to me that our goal should be a classification

              <br>

              of all existing things (not to exclude the narrower

              types). <br>

            </blockquote>

            <br>

            Yes, but note the slides I suggested in my first note: <br>

            <br>

               <a href="http://www.jfsowa.com/talks/kdptut.pdf" target="_blank">http://www.jfsowa.com/talks/kdptut.pdf</a>

            <br>

            <br>

            Slides 7 to 9:  Cyc project.  30 years of work (since 1984).

            <br>

            After the first 25 years, 100 million dollars and 1000

            person-years <br>

            of work (one person-millennium!), 600,000 concepts, defined

            by <br>

            5,000,000 axioms, organized in 6,000 microtheories -- and

            counting. <br>

            <br>

            Slide 10:  2300 years of universal ontology schemes -- and

            counting. <br>

            <br>

            <blockquote style="margin-top:0px;margin-bottom:0px">The

              Brandeis Shallow Ontology attempts to do this, and

              incidentally <br>

              is being used to characterize arguments of verbs in

              Patrick Hanks <br>

              corpus pattern analysis, i.e., in the imperfect world of

              language. <br>

            </blockquote>

            <br>

            I strongly believe in shallow, underspecified ontologies --

            especially <br>

            when they're supplemented with lots of lexical information

            about verbs <br>

            and their characteristic patterns. <br>

            <br>

            But I also believe that the key to having an open-ended

            variety of <br>

            specialized ontologies is to make the computers do what

            people do: <br>

            extend their ontologies automatically by reading books. <br>

            <br>

            Lenat made the mistake of assuming that you need to

            hand-code <br>

            a huge amount of knowledge before a system can start to read

            <br>

            by itself.  But that's wrong.  You need to design a system

            that <br>

            can automatically augment its ontology every step of the

            way. <br>

            <br>

            John <br>

            <br>

            _______________________________________________ <br>

            UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a>

            <br>

            Corpora mailing list <br>

            <a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a>

            <br>

            <a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a>

            <br>

            <br>

          </blockquote>

          <br>

          <br>

          <div>-- <br>

            <small>Sebastian Hellmann<br>

              AKSW/NLP2RDF research group<br>

              Insitute for Applied Informatics (InfAI) and DBpedia

              Association<br>

              Events: <br>

              * <b>Sept. 1-5, 2014</b> Conference Week in Leipzig,

              including <br>

              ** <b>Sept 2nd</b>, <a href="http://mlode2014.nlp2rdf.org/" target="_blank">MLODE

                2014</a> <br>

              ** <b>Sept 3rd</b>, <a href="http://wiki.dbpedia.org/meetings/Leipzig2014" target="_blank">2nd DBpedia Community Meeting</a><br>

              ** <b>Sept 4th-5th</b>, <a href="http://semantics.cc/" target="_blank">SEMANTiCS

                (formerly i-SEMANTICS) </a><br>

              Venha para a Alemanha como PhD: <a href="http://bis.informatik.uni-leipzig.de/csf" target="_blank">http://bis.informatik.uni-leipzig.de/csf</a><br>

              Projects: <a href="http://dbpedia.org" target="_blank">http://dbpedia.org</a>,

              <a href="http://nlp2rdf.org" target="_blank">http://nlp2rdf.org</a>, <a href="http://linguistics.okfn.org" target="_blank">http://linguistics.okfn.org</a>,

              <a href="http://www.w3.org/community/ld4lt" target="_blank">https://www.w3.org/community/ld4lt</a><br>

              Homepage: <a href="http://aksw.org/SebastianHellmann" target="_blank">http://aksw.org/SebastianHellmann</a><br>

              Research Group: <a href="http://aksw.org" target="_blank">http://aksw.org</a><br>

              Thesis:<br>

              <a href="http://tinyurl.com/sh-thesis-summary" target="_blank">http://tinyurl.com/sh-thesis-summary</a><br>

              <a href="http://tinyurl.com/sh-thesis" target="_blank">http://tinyurl.com/sh-thesis</a><br>

            </small></div>

        </div>

      </div>

    </blockquote>

    <br>

    <br>

    <div>-- <br>

      <small>Sebastian Hellmann<br>

        AKSW/NLP2RDF research group<br>

        Insitute for Applied Informatics (InfAI) and DBpedia Association<br>

        Events: <br>

        * <b>Sept. 1-5, 2014</b> Conference Week in Leipzig, including

        <br>

        ** <b>Sept 2nd</b>, <a href="http://mlode2014.nlp2rdf.org/" target="_blank">MLODE

          2014</a> <br>

        ** <b>Sept 3rd</b>, <a href="http://wiki.dbpedia.org/meetings/Leipzig2014" target="_blank">2nd

          DBpedia Community Meeting</a><br>

        ** <b>Sept 4th-5th</b>, <a href="http://semantics.cc/" target="_blank">SEMANTiCS

          (formerly i-SEMANTICS) </a><br>

        Venha para a Alemanha como PhD: <a href="http://bis.informatik.uni-leipzig.de/csf" target="_blank">http://bis.informatik.uni-leipzig.de/csf</a><br>

        Projects: <a href="http://dbpedia.org" target="_blank">http://dbpedia.org</a>,

        <a href="http://nlp2rdf.org" target="_blank">http://nlp2rdf.org</a>, <a href="http://linguistics.okfn.org" target="_blank">http://linguistics.okfn.org</a>,

        <a href="http://www.w3.org/community/ld4lt" target="_blank">https://www.w3.org/community/ld4lt</a><br>

        Homepage: <a href="http://aksw.org/SebastianHellmann" target="_blank">http://aksw.org/SebastianHellmann</a><br>

        Research Group: <a href="http://aksw.org" target="_blank">http://aksw.org</a><br>

        Thesis:<br>

        <a href="http://tinyurl.com/sh-thesis-summary" target="_blank">http://tinyurl.com/sh-thesis-summary</a><br>

        <a href="http://tinyurl.com/sh-thesis" target="_blank">http://tinyurl.com/sh-thesis</a><br>

      </small></div>

  </div></div></div>

</blockquote></div></div></div><br></div></div>

<br>_______________________________________________<br>

UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>

Corpora mailing list<br>

<a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>

<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>

<br></blockquote></div><br></div>