[Corpora-List] Moving Lexical Semantics from Alchemy to Science

Krishnamurthy, Ramesh r.krishnamurthy at aston.ac.uk
Sat Jan 29 11:20:46 UTC 2011


Hi Ken

Excellent idea!

I would certainly be interested in a concerted corpus-driven effort in this direction. 

I am taking 'early' retirement from my full-time Lecturer post at Aston on Feb 11th 2011.

I hope to be appointed Visiting Academic Fellow, an unpaid post, which will allow
me to maintain a presence in the virtual academic world, and to contribute to such a 
project... 

...but from now on I will need to acquire external funding to cover any academic activities...

Best
Ramesh

Ramesh Krishnamurthy
Lecturer in English Studies, School of Languages and Social Sciences,
Aston University, Birmingham B4 7ET, UK
Tel: +44 (0)121-204-3812 ; Fax: +44 (0)121-204-3766 [Room NX08, 10th
Floor, North Wing of Main Building]
http://www1.aston.ac.uk/lss/staff/krishnamurthyr/
Director, ACORN (Aston Corpus Network project): http://acorn.aston.ac.uk/ 

-----Original Message-----
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of corpora-request at uib.no
Sent: 29 January 2011 11:00
To: corpora at uib.no
Subject: Corpora Digest, Vol 43, Issue 34

Today's Topics:

   1. Re:  Moving Lexical Semantics from Alchemy to Science
      (Ken Litkowski)
   2. Re:  Moving Lexical Semantics from Alchemy to Science
      (Yorick Wilks)
   3. Re:  Moving Lexical Semantics from Alchemy to Science
      (Marco Baroni)


----------------------------------------------------------------------

Message: 1
Date: Fri, 28 Jan 2011 15:02:24 -0500
From: Ken Litkowski <ken at clres.com>
Subject: Re: [Corpora-List] Moving Lexical Semantics from Alchemy to
	Science
To: corpora at uib.no

On 1/28/2011 1:04 PM, Ted Pedersen wrote:
> What a fun thread. :)

One part alchemy and one part science. Since I kinda kicked off this 
thread with my concern about us not looking for primitives, I'd like to 
add a few further cents and a lament.

PBS newshour did a piece on Google's n-grams a few weeks ago ("Word 
Nerding Just Got Easier") with the ever delightful Erin McKean. This 
thread has partially followed that notion with all the humorous noun 
compounds. I hope we don't focus on those so much, except as needed to 
do crossword puzzles.

Yorick expressed his long experience with an apparent lack of progress. 
Certainly, Robert has clear scientific goals in mind and we have gotten 
some nice "scientific" observations, particularly from John, Ramesh, 
Anne-Kathrin, and Ted. It would be nice if we could get some 
community-wide effort into this. We need a vehicle, perhaps transforming 
Wiktionary. It would be nice if we could apply John's rules to Ted's 
compounds and *put those findings into a dictionary* (lexicographers 
have only barely done so, while lexicologists need that information).

     Ken

-- 
Ken Litkowski                     TEL.: 301-482-0237
CL Research                       EMAIL: ken at clres.com
9208 Gue Road                     Home Page: http://www.clres.com
Damascus, MD 20872-1025 USA       Blog: http://www.clres.com/blog

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 2140 bytes
Desc: not available
URL: <http://www.uib.no/mailman/public/corpora/attachments/20110128/e3b83cb9/attachment.txt>

------------------------------

Message: 2
Date: Fri, 28 Jan 2011 15:25:02 -0500
From: Yorick Wilks <Y.Wilks at dcs.shef.ac.uk>
Subject: Re: [Corpora-List] Moving Lexical Semantics from Alchemy to
	Science
To: Katrin Erk <katrin.erk at mail.utexas.edu>
Cc: "Corpora at uib.no" <Corpora at uib.no>, "Krishnamurthy,	Ramesh"
	<r.krishnamurthy at aston.ac.uk>

Im sure you can get something useful from word proximity computed over large corpora; to me the issues are what mechanisms best do that and what you can get out of it. theres a whole tradition of corpus based clustering leading to clumps or graphs, from Karen Sparck Jones thesis (1960s!) to the Pathfinder algorithms that gave nice weighted graphs *** in the 80s. I dont quite see whats new unless vectors and matrices add a lot to those simpler methods--until proved wrong Im prepared to bet they dont--it's just that people have forgotten them as they forget everything, and "matrix" sounds mathematically posher (though in fact Speck Jones did try to compute matrices but EDSAC2 was too small to cope at Cambridge in those days).
BUT, I just cant see that what youre describing leads to helpful paraphrases---listing out the "interpretations" of nearby words wont help will it.?
YW

***
McDonald, J. E., Plate, T. A., & Schvaneveldt, R. W. (1990). Using Pathfinder to extract semantic information from text. In R. Schvaneveldt (Ed.), Pathfinder associative networks: Studies in knowledge organization. (pp. 149-164). Norwood, NJ: Ablex.

> 

On 28 Jan 2011, at 14:59, Katrin Erk wrote:

>> OK but the problem is that you have to know what it is you are looking for closeness TO. Why would you seek relative proximity to
>> toy and food unless you already knew those were the words corresponding to or capturing the ambiguity--in other words you already have to know what the choices are (and in the case of "rubber chicken" it has both senses and proximity in a space cannot show ambiguity, can it?). Theres nothing about toys in any o the three components rubber/duck/chicken surely? Cohen and Margalit were asking one right question which is whether and how one could determine combination meaning from component-meanings--I dont see how the proximity analysis you cite can do that.
> 
> Baroni and Zamparelli actually don't use predefined words (like toy)
> to compare to, but determine the nearest neighbors of an expression in
> space. Then you can interpret each word through its nearest neighbors.
> So the interpretation of each expression is a list of paraphrases.
> 
> But I think you mean more than that by "determinining combination
> meaning". Something along the lines of being able to list all
> appropriate inferences to draw?
> 
> Assuming you mean that, my answer would be: Yes, in the end that's the
> goal, but for now this goal is too large. For now we need goals that
> work for the intermediate, smaller steps. And I think models like this
> one are on the right track because they can use corpus data to predict
> the meaning of words and expressions in context.
> 
> Katrin
> 
> 
>> Y
>> 
>> On 28 Jan 2011, at 14:41, Katrin Erk wrote:
>> 
>>> On Fri, Jan 28, 2011 at 1:32 PM, Yorick Wilks <Y.Wilks at dcs.shef.ac.uk> wrote:
>>>> Hmmm...not quite sure what "doing the right thing" for the rubber duck and chicken would be. Surely no method like this could provide a representation so what could it give at best?
>>> 
>>> It does provide a representation, the question is just how to
>>> interpret it and draw inferences from it. The most straightforward way
>>> is by testing closeness of the representation of the adj+noun pair to
>>> the representations of other expressions, say "toy" and "food". If the
>>> model gets rubber duck and chicken right, then it should predict (by
>>> measuring similarity/proximity in semantic space) that rubber duck is
>>> closer to "toy" than "food", and the other way round for the rubber
>>> chicken.
>>> 
>>> Katrin
>>> 
>>>> 
>>>> On 28 Jan 2011, at 14:25, Katrin Erk wrote:
>>>> 
>>>>> Hi all,
>>>>> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 6122 bytes
Desc: not available
URL: <http://www.uib.no/mailman/public/corpora/attachments/20110128/e61a0cf9/attachment.txt>

------------------------------

Message: 3
Date: Sat, 29 Jan 2011 00:06:21 +0100
From: Marco Baroni <marco.baroni at unitn.it>
Subject: Re: [Corpora-List] Moving Lexical Semantics from Alchemy to
	Science
To: Yorick Wilks <Y.Wilks at dcs.shef.ac.uk>
Cc: "Corpora at uib.no" <Corpora at uib.no>, "Krishnamurthy,	Ramesh"
	<r.krishnamurthy at aston.ac.uk>,	Roberto Zamparelli
	<roberto.zamparelli at unitn.it>

Dear Prof. Wilks,

I am one of the co-authors of the paper that Katrin kindly mentioned 
(thanks, Katrin!).

Similar ideas are currently being explored by others, including Emiliano 
Guevara, Daoud Clarke and colleagues, and Edward Grefenstette and 
colleagues.

We are using a mathematical tool from the mid 19th century (matrices) in 
order to apply intuitions from early seventies formal semantics 
(Montague and others) to corpus-based semantic models that were 
developed in the early nineties (LSA, HAL, ...), so we are not very posh 
-- we are a tad musty, if anything.

We represent adjectives as matrices because they are a simple way to 
encode a function from and onto vectors.

We are trying to capture, in "distributional semantics", the intuition 
(expressed by Montague and many others) that adjectives are functions 
that map nouns onto other nouns, where what the function does crucially 
depends on the input noun (so that "rubber" -- seen as an adjective -- 
is a function that can have a different effect when it maps "ball" onto 
"rubber ball" from the one it has when it maps "duck" onto "rubber duck").

Since nouns, in many corpus-based approaches, are represented as vectors 
of co-occurrence counts with collocates (documents), we treat adjectives 
as matrices that encode linear functions from and onto such vectors.

I am (partially) aware of the literature on Pathfinder and other earlier 
literature on measuring word proximity, but it does not seem to me to 
tackle the same challenge. We are using word/construction proximity to 
evaluate our method, but the core of what the method does is building 
larger constituents (adj+noun) from simpler ones (noun), which seems 
like something different from what Pathfinder does (what little I know 
of it).

I fully agree with you and Katrin that the major challenge for our model 
and its alternatives is to find convincing ways to evaluate whether it 
learned what it purports to learn.

Best regards,

Marco



-- 
Marco Baroni
Center for Mind/Brain Sciences (CIMeC)
University of Trento
http://clic.cimec.unitn.it/marco



----------------------------------------------------------------------
Send Corpora mailing list submissions to
	corpora at uib.no

To subscribe or unsubscribe via the World Wide Web, visit
	http://mailman.uib.no/listinfo/corpora
or, via email, send a message with subject or body 'help' to
	corpora-request at uib.no

You can reach the person managing the list at
	corpora-owner at uib.no

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Corpora digest..."


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


End of Corpora Digest, Vol 43, Issue 34
***************************************

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list