[Corpora-List] Moving Lexical Semantics from Alchemy to Science

Katrin Erk katrin.erk at mail.utexas.edu
Fri Jan 28 19:59:23 UTC 2011


> OK but the problem is that you have to know what it is you are looking for closeness TO. Why would you seek relative proximity to
> toy and food unless you already knew those were the words corresponding to or capturing the ambiguity--in other words you already have to know what the choices are (and in the case of "rubber chicken" it has both senses and proximity in a space cannot show ambiguity, can it?). Theres nothing about toys in any o the three components rubber/duck/chicken surely? Cohen and Margalit were asking one right question which is whether and how one could determine combination meaning from component-meanings--I dont see how the proximity analysis you cite can do that.

Baroni and Zamparelli actually don't use predefined words (like toy)
to compare to, but determine the nearest neighbors of an expression in
space. Then you can interpret each word through its nearest neighbors.
So the interpretation of each expression is a list of paraphrases.

But I think you mean more than that by "determinining combination
meaning". Something along the lines of being able to list all
appropriate inferences to draw?

Assuming you mean that, my answer would be: Yes, in the end that's the
goal, but for now this goal is too large. For now we need goals that
work for the intermediate, smaller steps. And I think models like this
one are on the right track because they can use corpus data to predict
the meaning of words and expressions in context.

Katrin


> Y
>
> On 28 Jan 2011, at 14:41, Katrin Erk wrote:
>
>> On Fri, Jan 28, 2011 at 1:32 PM, Yorick Wilks <Y.Wilks at dcs.shef.ac.uk> wrote:
>>> Hmmm...not quite sure what "doing the right thing" for the rubber duck and chicken would be. Surely no method like this could provide a representation so what could it give at best?
>>
>> It does provide a representation, the question is just how to
>> interpret it and draw inferences from it. The most straightforward way
>> is by testing closeness of the representation of the adj+noun pair to
>> the representations of other expressions, say "toy" and "food". If the
>> model gets rubber duck and chicken right, then it should predict (by
>> measuring similarity/proximity in semantic space) that rubber duck is
>> closer to "toy" than "food", and the other way round for the rubber
>> chicken.
>>
>> Katrin
>>
>>>
>>> On 28 Jan 2011, at 14:25, Katrin Erk wrote:
>>>
>>>> Hi all,
>>>>
>>>> On Fri, Jan 28, 2011 at 10:42 AM, Yorick Wilks <Y.Wilks at dcs.shef.ac.uk> wrote:
>>>>> This discussion has been going on in Ai and linguistics (and in philosophy a bit) for at least 40 years and Im worrying now that we arent making progress: if there was any justice corpora would help here!  In 1972 Cohen and Margalit discussed what properties you could predict of a "rubber duck" from its components--i.e. that were different from a  regular old duck: their claim, if I remember right, was that you couldnt make any.
>>>>
>>>> In fact, there has recently been a paper that uses corpus evidence on
>>>> related cases, namely adjective-noun pairs:
>>>>
>>>> Marco Baroni; Roberto Zamparelli
>>>> Nouns are Vectors, Adjectives are Matrices: Representing
>>>> Adjective-Noun Constructions in Semantic Space. Proceedings of EMNLP
>>>> 2010, http://aclweb.org/anthology/D/D10/D10-1115.pdf
>>>>
>>>> They use the contexts in which the adj/noun pairs are found to learn a
>>>> representation for each adjective that maps noun vectors to new
>>>> vectors for the adj/noun pair. Their approach should do the right
>>>> thing for "rubber duck"/"rubber chicken", or at least it should be
>>>> able to if "rubber" is reasonably frequent in the corpus.
>>>>
>>>> Best,
>>>> Katrin
>>>>
>>>>
>>>> Later AI people weighed in for a couple of decades --including
>>>> me--arguing that, well, with some reasonable assumptions about the
>>>> state of the world you could make some reasonable predictions in at
>>>> least some cases. Though this would, inevitably, be dependent on an
>>>> individual's knowledge state as well--it is not just a matter of  some
>>>> objective linguistic base or widely shared knowledge---and this is how
>>>> poets work, as we all know. I wrote on this with colleagues in 1991
>>>> under the title "Your metaphor or mine?". But those were still
>>>> pre-corpus days, by and large, so we must have moved on a bit from
>>>> examples now, no? I worked with a student a few years ago on
>>>> extracting novel compounds from very large web corpora e.g. hardly
>>>> present in say 1995 but much represented in 2000--there was an
>>>> interesting, and related, set of examples that emerged but I couldnt
>>>> see any way to publish them so as to make any claims.
>>>>> Examples are more fun than computing, of course, and Im still obsessed with things like "rubber duck" (in the bath) doesnt go the same way as "rubber chicken" (banquet food, as well as being a comedy prop)--I suppose enough facts about the distribution of meats at banquets might make this predictable, but Im not confident.
>>>>> Yorick Wilks
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 28 Jan 2011, at 10:53, Dominic Widdows wrote:
>>>>>
>>>>>> On Fri, Jan 28, 2011 at 10:36 AM,  <amsler at cs.utexas.edu> wrote:
>>>>>>> Technically, yes; but what I think makes a truly interesting combination is
>>>>>>> when the alternate meaning arises accidentally to serve a necessary purpose.
>>>>>>
>>>>>> Technically yes, but in practice, no - compounds have a well-known
>>>>>> property of (usually) only taking on some of the available meanings.
>>>>>>
>>>>>> There's some good literature on this, but being a parent of small
>>>>>> children my favourite by a long way is the song "When I see and
>>>>>> elephant fly."
>>>>>>
>>>>>> http://lyricsplayground.com/alpha/songs/w/wheniseeanelephantfly.shtml
>>>>>>
>>>>>> Best wishes,
>>>>>> Dominic
>>>>>>
>>>>>>> The reason 'solar system' is interesting is that I don't think the people
>>>>>>> who coined it were intentionally trying to be funny. Their domain used
>>>>>>> 'solar' in a whole array (sorry) of compounds consistent with only one
>>>>>>> meaning until they accidentally coined one compound that collided with the
>>>>>>> other meaning.
>>>>>>>
>>>>>>> I suppose one could distinguish between 'the solar system' and 'a solar
>>>>>>> system' (at least until recently, when astronomers started looking for
>>>>>>> extra-solar planets), but what I'm trying to say is that the ambiguous ones
>>>>>>> I'm most interested in are those that came about via evolutionary processes
>>>>>>> and somehow managed to both get established thus demonstrating two
>>>>>>> decompositional principles that are sustainable within the language.
>>>>>>>
>>>>>>> The fragility of these combinations is obvious as they violate a fundamental
>>>>>>> principle of discourse, i.e., being clear as to what one means. The BBC
>>>>>>> examples are excellent because they are 'real'. One should force the other
>>>>>>> out of existence once the perception of the ambiguity dawns on most people.
>>>>>>> Either that or force the addition of words for clarification, as in
>>>>>>> 'astronomical solar system' vs. 'solar energy system'.
>>>>>>>
>>>>>>> Quoting "Krishnamurthy, Ramesh" <r.krishnamurthy at aston.ac.uk>:
>>>>>>>
>>>>>>>> Hi all
>>>>>>>>
>>>>>>>> a) Surely any multi-word item involving at least one polysemous  element
>>>>>>>> would be a candidate?
>>>>>>>> e.g. civil service [service = an act or an organization]
>>>>>>>>
>>>>>>>> b) Or indeed, any pair of words, as they have the potential to  engage in
>>>>>>>> a variety of case relationships?
>>>>>>>> e.g. walking stick
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> c) Then there's the problem of segmentation/sequence, i.e. "(a+b) +  c" or
>>>>>>>> "a + (b+c)"?
>>>>>>>>
>>>>>>>> e.g. hot water tap
>>>>>>>>
>>>>>>>> Best
>>>>>>>> Ramesh Krishnamurthy
>>>>>>>> Lecturer in English Studies, School of Languages and Social Sciences,
>>>>>>>> Aston University, Birmingham B4 7ET, UK
>>>>>>>> Tel: +44 (0)121-204-3812 ; Fax: +44 (0)121-204-3766 [Room NX08, 10th
>>>>>>>> Floor, North Wing of Main Building]
>>>>>>>> http://www1.aston.ac.uk/lss/staff/krishnamurthyr/
>>>>>>>> Director, ACORN (Aston Corpus Network project): http://acorn.aston.ac.uk/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Message: 6
>>>>>>>>
>>>>>>>> Date: Fri, 28 Jan 2011 10:43:45 +0000
>>>>>>>>
>>>>>>>> From: Justin Washtell <lec3jrw at leeds.ac.uk<mailto:lec3jrw at leeds.ac.uk>>
>>>>>>>>
>>>>>>>> Subject: Re: [Corpora-List] Moving Lexical Semantics from Alchemy to
>>>>>>>>
>>>>>>>>      Science
>>>>>>>>
>>>>>>>> To: David Wible <wible at stringnet.org<mailto:wible at stringnet.org>>,  John
>>>>>>>> Williams
>>>>>>>>
>>>>>>>>      <j0hnwh0ever.corpora at gmail.com<mailto:j0hnwh0ever.corpora at gmail.com>>
>>>>>>>>
>>>>>>>> Cc: "Corpora at uib.no<mailto:Corpora at uib.no>"
>>>>>>>>  <Corpora at uib.no<mailto:Corpora at uib.no>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Ancient history teachers.
>>>>>>>>
>>>>>>>> Or, a little tenuously, comprehensive ancient history teachers.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Justin Washtell
>>>>>>>>
>>>>>>>> University of Leeds
>>>>>>>>
>>>>>>>> ________________________________________
>>>>>>>>
>>>>>>>> From: corpora-bounces at uib.no<mailto:corpora-bounces at uib.no>
>>>>>>>>  [corpora-bounces at uib.no] On Behalf Of David Wible  [wible at stringnet.org]
>>>>>>>>
>>>>>>>> Sent: 28 January 2011 09:17
>>>>>>>>
>>>>>>>> To: John Williams
>>>>>>>>
>>>>>>>> Cc: Corpora at uib.no<mailto:Corpora at uib.no>
>>>>>>>>
>>>>>>>> Subject: Re: [Corpora-List] Moving Lexical Semantics from Alchemy to
>>>>>>>> Science
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> How about 'heavy metal fans'?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jan 27, 2011 at 7:57 PM, John Williams
>>>>>>>>  <j0hnwh0ever.corpora at gmail.com<mailto:j0hnwh0ever.corpora at gmail.com<mailto:j0hnwh0ever.corpora at gmail.com%3cmailto:j0hnwh0ever.corpora at gmail.com>>>
>>>>>>>>  wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ... P.S. Anyone have some other ambiguous open compounds they are
>>>>>>>>  familiar with, besides 'solar system'?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 'golf club' springs to mind
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> j0hn
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -----------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> John Williams
>>>>>>>>
>>>>>>>> Lecturer in English Language and Linguistics
>>>>>>>>
>>>>>>>> School of Languages and Area Studies
>>>>>>>>
>>>>>>>> PK 2.18, University of Portsmouth
>>>>>>>>
>>>>>>>> Portsmouth PO1 2DZ
>>>>>>>>
>>>>>>>> Tel: (0239 284) 2162
>>>>>>>>
>>>>>>>> Email:
>>>>>>>>  john.x.williams at port.ac.uk<mailto:john.x.williams at port.ac.uk<mailto:john.x.williams at port.ac.uk%3cmailto:john.x.williams at port.ac.uk>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Corpora mailing list
>>>>>>> Corpora at uib.no
>>>>>>> http://mailman.uib.no/listinfo/corpora
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Corpora mailing list
>>>>>> Corpora at uib.no
>>>>>> http://mailman.uib.no/listinfo/corpora
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Corpora mailing list
>>>>> Corpora at uib.no
>>>>> http://mailman.uib.no/listinfo/corpora
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Katrin Erk, Department of Linguistics
>>>> The University of Texas at Austin
>>>> http://comp.ling.utexas.edu/people/katrin_erk
>>>
>>>
>>>
>>
>>
>>
>> --
>> Katrin Erk, Department of Linguistics
>> The University of Texas at Austin
>> http://comp.ling.utexas.edu/people/katrin_erk
>
>
>



-- 
Katrin Erk, Department of Linguistics
The University of Texas at Austin
http://comp.ling.utexas.edu/people/katrin_erk

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list