[Corpora-List] Moving Lexical Semantics from Alchemy to Science

Fri Jan 28 23:06:21 UTC 2011

Dear Prof. Wilks,

I am one of the co-authors of the paper that Katrin kindly mentioned 
(thanks, Katrin!).

Similar ideas are currently being explored by others, including Emiliano 
Guevara, Daoud Clarke and colleagues, and Edward Grefenstette and 
colleagues.

We are using a mathematical tool from the mid 19th century (matrices) in 
order to apply intuitions from early seventies formal semantics 
(Montague and others) to corpus-based semantic models that were 
developed in the early nineties (LSA, HAL, ...), so we are not very posh 
-- we are a tad musty, if anything.

We represent adjectives as matrices because they are a simple way to 
encode a function from and onto vectors.

We are trying to capture, in "distributional semantics", the intuition 
(expressed by Montague and many others) that adjectives are functions 
that map nouns onto other nouns, where what the function does crucially 
depends on the input noun (so that "rubber" -- seen as an adjective -- 
is a function that can have a different effect when it maps "ball" onto 
"rubber ball" from the one it has when it maps "duck" onto "rubber duck").

Since nouns, in many corpus-based approaches, are represented as vectors 
of co-occurrence counts with collocates (documents), we treat adjectives 
as matrices that encode linear functions from and onto such vectors.

I am (partially) aware of the literature on Pathfinder and other earlier 
literature on measuring word proximity, but it does not seem to me to 
tackle the same challenge. We are using word/construction proximity to 
evaluate our method, but the core of what the method does is building 
larger constituents (adj+noun) from simpler ones (noun), which seems 
like something different from what Pathfinder does (what little I know 
of it).

I fully agree with you and Katrin that the major challenge for our model 
and its alternatives is to find convincing ways to evaluate whether it 
learned what it purports to learn.

Best regards,

Marco

-- 
Marco Baroni
Center for Mind/Brain Sciences (CIMeC)
University of Trento
http://clic.cimec.unitn.it/marco

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora