[Corpora-List] Using patterns for disambiguation

Khurshid Ahmad kahmad at scss.tcd.ie
Thu Jul 21 11:44:47 UTC 2011


Dear Patrick
The example you have selected to 'disambiguate' - the verb 'throw'- has
about 35 different senses as recorded by our colleagues at OED.  The
genesis of the the different senses of the verb is an interesting one:

In English the orig. sense ‘twist, turn’ remained in the north, and in
certain technical uses (see branch I); otherwise it passed in Middle
English into that of branch II, = Old English weorpan, perhaps through an
unrecorded sense ‘throw by a turn or twist of the arm, or with a sling’

(By north I think our Oxford lexicograopher means Norther England)

My point is that the senses of the word emerge largely due to the
pragmatics of use, fusion of different languages (throw: Old English-->
Middle Duthc--> High German--> Middle English --> US/American English) and
different socio-political influences.

I am interested to find out how an 'algorithm', Sketch Engined or
otherwise, will help in distiguishing
Sense 18b of the verb 'throw': 'throw a party' (To give or hold (a party),
esp. one of an informal or impromptu nature. colloq. (orig. U.S.), first
recorded use in 1922)

from

Sense 18a of the verb 'throw': 'throw a fit' (To perform, execute (a
somersault or a leap, in which the body is thrown with force); also to
throw a fit , to have a fit (slang (orig. U.S.)). Chiefly fig., first
recorded use 1826)?

I hope we keep linked in,
Best wishes

> [ First, an apology. Due to technological incompetence, I have apparently
> inviting ALL subscribers to LinkedIn to be my contacts.  Aaargh. Be warned
> -- it's easily done! There must be some very surprised subscribers out
> there.]
>
> The purpose of this posting  is to draw attention to the need to correlate
> arguments when doing or using corpus pattern analysis.
>
> With colleagues at RIILP (if funded), I'm planning to develop a system for
> corpus tagging of lexical items in verb arguments with their semantic
> type,
> i.e. "populating" the CPA empirical ontology with nouns and noun phrases.
> This can be based in part on preparatory work that was done last year by
> Martin Holub in Prague.
>
> Are semantic types alone sufficient for disambiguation?  No! They are more
> relevant than thematic roles, but they are only part of the story -- not
> sufficient, because patterns need to be identified as a whole, with
> correlation of arguments.  Here's an example I came across last week:
>
>                   PDEV PATTERN throw a party = organize a social gathering
>
> No problem with that. But then I discovered that, using the Sketch Engine,
> I
> had (carelessly) *globally* tagged all occurrences of throw+party with
> this
> pattern. So, on re-visiting 'throw', in among the party throwers I found
> sentences like:
>
>                    Recent events have thrown the Party into disarray,
>
> where the reference is to a political party, not a social gathering. This
> is
> an example of a fairly common kind of error in doing CPA, no doubt due to
> an
> excessively zealous desire to achieve quantity and speed. More haste, less
> speed! This means that, while most CPA patterns are well supported by
> corpus
> evidence, there is a need for systematic validation before they can be
> released as a gold standard for NLP applications.
>
> One thing that computational linguists might infer from this is that
> unambiguous identification of a meaning often depends crucially on *
> correlation* of arguments within a pattern -- what Ken Church and I 20
> years
> ago called "triangulation".
>
> There are of course some cases where the semantic type of the direct
> object
> alone is sufficient, e.g.:
>
> execute a person vs. execute an order.
>
> but this is true only of a minority of verbs.  Patterns must be taken as a
> whole.  Thematic roles are all very well in their way, but they don't get
> us
> very far down the road towards message understanding. And it is a source
> of
> confusion to call them "semantic roles".  There is a huge difference
> between
> A) thematic roles such as "Agent", "Patient", Beneficiary", "Instrument"
> and
> B) semantic types such as [[Human]], [[Physical Object]], [[Weapon]].
> Semantic types denote intrinsic semantic properties.  Thematic roles
> don't.
>
> Comments/feedback welcome.
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>


Khurshid Ahmad

Professor of Computer Science
Department of Computer Science
Trinity College,
DUBLIN-2
IRELAND
Phone 00 353 1 896 8429

Web Page: http://people.tcd.ie/kahmad


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list