[Corpora-List] Moving Lexical Semantics from Alchemy to Science

Rich Cooper rich at englishlogickernel.com
Fri Jan 28 20:04:31 UTC 2011


Hi Yorick, Katrin, Ramesh,

I think the main value for "rubber duck" examples of compounds is just in
identifying that compounds should be treated as case-based knowledge scraps,
not as synchronized component expressions in an algebraic expression of word
combination semantics.  Syntax alone cannot predict them effectively, so the
case base of outliers is a useful way of partitioning the corpora phrases.  

"Rubber" would compound with some words in a completely different semantic
construction than with other words, and not at all with still other words,
as discoverable within a corpus collection.  But the sequence of words that
consistently appears within a corpus collection immediately after (and
before) "rubber" could sieve the meanings of the word "rubber" as it is
actually used in compounds within the corpora in a useful way.  Same for
"duck" (eat it or quickly move out of the way) and other strange compound
vocabularies.  

So the previously suggested matrix of Boolean marks in pairs of vocabulary
classes sounds like a useful representation.  Often, it is useful to sort
the columns and the rows, maintaining the stored verbal relationships, to
cluster the 1's and 0's in the matrix into something that can be expressed
in terms of subclasses.  That is where the meaning seems to be pinned, IMHO.


Frequent versus Rare words, or Frequent versus Other words, or Rare versus
Other word classes, or any other adjacency sensitive transition matrices,
can be useful tools for discovering meanings.  

JMHO,
-Rich
 
Sincerely,
Rich Cooper
EnglishLogicKernel.com
Rich AT EnglishLogicKernel DOT com
9 4 9 \ 5 2 5 - 5 7 1 2

-----Original Message-----
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
Yorick Wilks
Sent: Friday, January 28, 2011 11:33 AM
To: Katrin Erk
Cc: Krishnamurthy, Ramesh; Corpora at uib.no
Subject: Re: [Corpora-List] Moving Lexical Semantics from Alchemy to Science

Hmmm...not quite sure what "doing the right thing" for the rubber duck and
chicken would be. Surely no method like this could provide a representation
so what could it give at best?
YW

On 28 Jan 2011, at 14:25, Katrin Erk wrote:

> Hi all,
> 
> On Fri, Jan 28, 2011 at 10:42 AM, Yorick Wilks <Y.Wilks at dcs.shef.ac.uk>
wrote:
>> This discussion has been going on in Ai and linguistics (and in
philosophy a bit) for at least 40 years and Im worrying now that we arent
making progress: if there was any justice corpora would help here!  In 1972
Cohen and Margalit discussed what properties you could predict of a "rubber
duck" from its components--i.e. that were different from a  regular old
duck: their claim, if I remember right, was that you couldnt make any.
> 
> In fact, there has recently been a paper that uses corpus evidence on
> related cases, namely adjective-noun pairs:
> 
> Marco Baroni; Roberto Zamparelli
> Nouns are Vectors, Adjectives are Matrices: Representing
> Adjective-Noun Constructions in Semantic Space. Proceedings of EMNLP
> 2010, http://aclweb.org/anthology/D/D10/D10-1115.pdf
> 
> They use the contexts in which the adj/noun pairs are found to learn a
> representation for each adjective that maps noun vectors to new
> vectors for the adj/noun pair. Their approach should do the right
> thing for "rubber duck"/"rubber chicken", or at least it should be
> able to if "rubber" is reasonably frequent in the corpus.
> 
> Best,
> Katrin
> 
> 
> Later AI people weighed in for a couple of decades --including
> me--arguing that, well, with some reasonable assumptions about the
> state of the world you could make some reasonable predictions in at
> least some cases. Though this would, inevitably, be dependent on an
> individual's knowledge state as well--it is not just a matter of  some
> objective linguistic base or widely shared knowledge---and this is how
> poets work, as we all know. I wrote on this with colleagues in 1991
> under the title "Your metaphor or mine?". But those were still
> pre-corpus days, by and large, so we must have moved on a bit from
> examples now, no? I worked with a student a few years ago on
> extracting novel compounds from very large web corpora e.g. hardly
> present in say 1995 but much represented in 2000--there was an
> interesting, and related, set of examples that emerged but I couldnt
> see any way to publish them so as to make any claims.
>> Examples are more fun than computing, of course, and Im still obsessed
with things like "rubber duck" (in the bath) doesnt go the same way as
"rubber chicken" (banquet food, as well as being a comedy prop)--I suppose
enough facts about the distribution of meats at banquets might make this
predictable, but Im not confident.
>> Yorick Wilks
>> 
>> 
>> 
>> 
>> 
>> On 28 Jan 2011, at 10:53, Dominic Widdows wrote:
>> 
>>> On Fri, Jan 28, 2011 at 10:36 AM,  <amsler at cs.utexas.edu> wrote:
>>>> Technically, yes; but what I think makes a truly interesting
combination is
>>>> when the alternate meaning arises accidentally to serve a necessary
purpose.
>>> 
>>> Technically yes, but in practice, no - compounds have a well-known
>>> property of (usually) only taking on some of the available meanings.
>>> 
>>> There's some good literature on this, but being a parent of small
>>> children my favourite by a long way is the song "When I see and
>>> elephant fly."
>>> 
>>> http://lyricsplayground.com/alpha/songs/w/wheniseeanelephantfly.shtml
>>> 
>>> Best wishes,
>>> Dominic
>>> 
>>>> The reason 'solar system' is interesting is that I don't think the
people
>>>> who coined it were intentionally trying to be funny. Their domain used
>>>> 'solar' in a whole array (sorry) of compounds consistent with only one
>>>> meaning until they accidentally coined one compound that collided with
the
>>>> other meaning.
>>>> 
>>>> I suppose one could distinguish between 'the solar system' and 'a solar
>>>> system' (at least until recently, when astronomers started looking for
>>>> extra-solar planets), but what I'm trying to say is that the ambiguous
ones
>>>> I'm most interested in are those that came about via evolutionary
processes
>>>> and somehow managed to both get established thus demonstrating two
>>>> decompositional principles that are sustainable within the language.
>>>> 
>>>> The fragility of these combinations is obvious as they violate a
fundamental
>>>> principle of discourse, i.e., being clear as to what one means. The BBC
>>>> examples are excellent because they are 'real'. One should force the
other
>>>> out of existence once the perception of the ambiguity dawns on most
people.
>>>> Either that or force the addition of words for clarification, as in
>>>> 'astronomical solar system' vs. 'solar energy system'.
>>>> 
>>>> Quoting "Krishnamurthy, Ramesh" <r.krishnamurthy at aston.ac.uk>:
>>>> 
>>>>> Hi all
>>>>> 
>>>>> a) Surely any multi-word item involving at least one polysemous
element
>>>>> would be a candidate?
>>>>> e.g. civil service [service = an act or an organization]
>>>>> 
>>>>> b) Or indeed, any pair of words, as they have the potential to  engage
in
>>>>> a variety of case relationships?
>>>>> e.g. walking stick
>>>>> 
>>>>> 
>>>>> 
>>>>> c) Then there's the problem of segmentation/sequence, i.e. "(a+b) +
c" or
>>>>> "a + (b+c)"?
>>>>> 
>>>>> e.g. hot water tap
>>>>> 
>>>>> Best
>>>>> Ramesh Krishnamurthy
>>>>> Lecturer in English Studies, School of Languages and Social Sciences,
>>>>> Aston University, Birmingham B4 7ET, UK
>>>>> Tel: +44 (0)121-204-3812 ; Fax: +44 (0)121-204-3766 [Room NX08, 10th
>>>>> Floor, North Wing of Main Building]
>>>>> http://www1.aston.ac.uk/lss/staff/krishnamurthyr/
>>>>> Director, ACORN (Aston Corpus Network project):
http://acorn.aston.ac.uk/
>>>>> 
>>>>> 
>>>>> 
>>>>> Message: 6
>>>>> 
>>>>> Date: Fri, 28 Jan 2011 10:43:45 +0000
>>>>> 
>>>>> From: Justin Washtell
<lec3jrw at leeds.ac.uk<mailto:lec3jrw at leeds.ac.uk>>
>>>>> 
>>>>> Subject: Re: [Corpora-List] Moving Lexical Semantics from Alchemy to
>>>>> 
>>>>>      Science
>>>>> 
>>>>> To: David Wible <wible at stringnet.org<mailto:wible at stringnet.org>>,
John
>>>>> Williams
>>>>> 
>>>>>
<j0hnwh0ever.corpora at gmail.com<mailto:j0hnwh0ever.corpora at gmail.com>>
>>>>> 
>>>>> Cc: "Corpora at uib.no<mailto:Corpora at uib.no>"
>>>>>  <Corpora at uib.no<mailto:Corpora at uib.no>>
>>>>> 
>>>>> 
>>>>> 
>>>>> Ancient history teachers.
>>>>> 
>>>>> Or, a little tenuously, comprehensive ancient history teachers.
>>>>> 
>>>>> 
>>>>> 
>>>>> Justin Washtell
>>>>> 
>>>>> University of Leeds
>>>>> 
>>>>> ________________________________________
>>>>> 
>>>>> From: corpora-bounces at uib.no<mailto:corpora-bounces at uib.no>
>>>>>  [corpora-bounces at uib.no] On Behalf Of David Wible
[wible at stringnet.org]
>>>>> 
>>>>> Sent: 28 January 2011 09:17
>>>>> 
>>>>> To: John Williams
>>>>> 
>>>>> Cc: Corpora at uib.no<mailto:Corpora at uib.no>
>>>>> 
>>>>> Subject: Re: [Corpora-List] Moving Lexical Semantics from Alchemy to
>>>>> Science
>>>>> 
>>>>> 
>>>>> 
>>>>> How about 'heavy metal fans'?
>>>>> 
>>>>> 
>>>>> 
>>>>> David
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, Jan 27, 2011 at 7:57 PM, John Williams
>>>>>
<j0hnwh0ever.corpora at gmail.com<mailto:j0hnwh0ever.corpora at gmail.com<mailto:j
0hnwh0ever.corpora at gmail.com%3cmailto:j0hnwh0ever.corpora at gmail.com>>>
>>>>>  wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> ... P.S. Anyone have some other ambiguous open compounds they are
>>>>>  familiar with, besides 'solar system'?
>>>>> 
>>>>> 
>>>>> 
>>>>> 'golf club' springs to mind
>>>>> 
>>>>> 
>>>>> 
>>>>> j0hn
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -----------
>>>>> 
>>>>> 
>>>>> 
>>>>> John Williams
>>>>> 
>>>>> Lecturer in English Language and Linguistics
>>>>> 
>>>>> School of Languages and Area Studies
>>>>> 
>>>>> PK 2.18, University of Portsmouth
>>>>> 
>>>>> Portsmouth PO1 2DZ
>>>>> 
>>>>> Tel: (0239 284) 2162
>>>>> 
>>>>> Email:
>>>>>
john.x.williams at port.ac.uk<mailto:john.x.williams at port.ac.uk<mailto:john.x.w
illiams at port.ac.uk%3cmailto:john.x.williams at port.ac.uk>>
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Corpora mailing list
>>>> Corpora at uib.no
>>>> http://mailman.uib.no/listinfo/corpora
>>>> 
>>> 
>>> _______________________________________________
>>> Corpora mailing list
>>> Corpora at uib.no
>>> http://mailman.uib.no/listinfo/corpora
>> 
>> 
>> 
>> _______________________________________________
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>> 
> 
> 
> 
> -- 
> Katrin Erk, Department of Linguistics
> The University of Texas at Austin
> http://comp.ling.utexas.edu/people/katrin_erk



_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list