[Corpora-List] Using patterns for disambiguation

Yorick Wilks Y.Wilks at dcs.shef.ac.uk
Thu Jul 21 11:12:24 UTC 2011


I think you'll find the general notion of taking all such slot satisfactions/violations TOGETHER was known more that 20  years ago! The problem in the example of "throwing the Party into disrray" is not so much the combination of the slots and their fillers  for "throw" as with the complex  "throw X  into disarray"---this fixed form/metaphor/idiom/Fillmore frame/what you will has to be picked up first ---I dont think this can be done properly just by maniplulating/ceating patterns for "throw".
Yorick Wilks


On 21 Jul 2011, at 11:40, Patrick Hanks wrote:

> [ First, an apology. Due to technological incompetence, I have apparently inviting ALL subscribers to LinkedIn to be my contacts.  Aaargh. Be warned -- it's easily done! There must be some very surprised subscribers out there.]
> 
> The purpose of this posting  is to draw attention to the need to correlate arguments when doing or using corpus pattern analysis. 
> 
> With colleagues at RIILP (if funded), I'm planning to develop a system for corpus tagging of lexical items in verb arguments with their semantic type, i.e. "populating" the CPA empirical ontology with nouns and noun phrases. This can be based in part on preparatory work that was done last year by Martin Holub in Prague. 
> 
> Are semantic types alone sufficient for disambiguation?  No! They are more relevant than thematic roles, but they are only part of the story -- not sufficient, because patterns need to be identified as a whole, with correlation of arguments.  Here's an example I came across last week:
> 
>                   PDEV PATTERN throw a party = organize a social gathering
> 
> No problem with that. But then I discovered that, using the Sketch Engine, I had (carelessly) globally tagged all occurrences of throw+party with this pattern. So, on re-visiting 'throw', in among the party throwers I found sentences like:
> 
>                    Recent events have thrown the Party into disarray,
> 
> where the reference is to a political party, not a social gathering. This is an example of a fairly common kind of error in doing CPA, no doubt due to an excessively zealous desire to achieve quantity and speed. More haste, less speed! This means that, while most CPA patterns are well supported by corpus evidence, there is a need for systematic validation before they can be released as a gold standard for NLP applications. 
> 
> One thing that computational linguists might infer from this is that unambiguous identification of a meaning often depends crucially on correlation of arguments within a pattern -- what Ken Church and I 20 years ago called "triangulation".
> 
> There are of course some cases where the semantic type of the direct object alone is sufficient, e.g.:
> 
> execute a person vs. execute an order.
> 
> but this is true only of a minority of verbs.  Patterns must be taken as a whole.  Thematic roles are all very well in their way, but they don't get us very far down the road towards message understanding. And it is a source of confusion to call them "semantic roles".  There is a huge difference between A) thematic roles such as "Agent", "Patient", Beneficiary", "Instrument" and B) semantic types such as [[Human]], [[Physical Object]], [[Weapon]]. Semantic types denote intrinsic  semantic properties.  Thematic roles don't. 
> 
> Comments/feedback welcome. 
> 
> 
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110721/14ebf38b/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list