[Corpora-List] Moving Lexical Semantics from Alchemy to Science
Marco Baroni
marco.baroni at unitn.it
Fri Jan 28 23:06:21 UTC 2011
Dear Prof. Wilks,
I am one of the co-authors of the paper that Katrin kindly mentioned
(thanks, Katrin!).
Similar ideas are currently being explored by others, including Emiliano
Guevara, Daoud Clarke and colleagues, and Edward Grefenstette and
colleagues.
We are using a mathematical tool from the mid 19th century (matrices) in
order to apply intuitions from early seventies formal semantics
(Montague and others) to corpus-based semantic models that were
developed in the early nineties (LSA, HAL, ...), so we are not very posh
-- we are a tad musty, if anything.
We represent adjectives as matrices because they are a simple way to
encode a function from and onto vectors.
We are trying to capture, in "distributional semantics", the intuition
(expressed by Montague and many others) that adjectives are functions
that map nouns onto other nouns, where what the function does crucially
depends on the input noun (so that "rubber" -- seen as an adjective --
is a function that can have a different effect when it maps "ball" onto
"rubber ball" from the one it has when it maps "duck" onto "rubber duck").
Since nouns, in many corpus-based approaches, are represented as vectors
of co-occurrence counts with collocates (documents), we treat adjectives
as matrices that encode linear functions from and onto such vectors.
I am (partially) aware of the literature on Pathfinder and other earlier
literature on measuring word proximity, but it does not seem to me to
tackle the same challenge. We are using word/construction proximity to
evaluate our method, but the core of what the method does is building
larger constituents (adj+noun) from simpler ones (noun), which seems
like something different from what Pathfinder does (what little I know
of it).
I fully agree with you and Katrin that the major challenge for our model
and its alternatives is to find convincing ways to evaluate whether it
learned what it purports to learn.
Best regards,
Marco
--
Marco Baroni
Center for Mind/Brain Sciences (CIMeC)
University of Trento
http://clic.cimec.unitn.it/marco
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list