[Corpora-List] Using patterns for disambiguation

Patrick Hanks patrick.w.hanks at gmail.com
Thu Jul 21 10:40:35 UTC 2011


[ First, an apology. Due to technological incompetence, I have apparently
inviting ALL subscribers to LinkedIn to be my contacts.  Aaargh. Be warned
-- it's easily done! There must be some very surprised subscribers out
there.]

The purpose of this posting  is to draw attention to the need to correlate
arguments when doing or using corpus pattern analysis.

With colleagues at RIILP (if funded), I'm planning to develop a system for
corpus tagging of lexical items in verb arguments with their semantic type,
i.e. "populating" the CPA empirical ontology with nouns and noun phrases.
This can be based in part on preparatory work that was done last year by
Martin Holub in Prague.

Are semantic types alone sufficient for disambiguation?  No! They are more
relevant than thematic roles, but they are only part of the story -- not
sufficient, because patterns need to be identified as a whole, with
correlation of arguments.  Here's an example I came across last week:

                  PDEV PATTERN throw a party = organize a social gathering

No problem with that. But then I discovered that, using the Sketch Engine, I
had (carelessly) *globally* tagged all occurrences of throw+party with this
pattern. So, on re-visiting 'throw', in among the party throwers I found
sentences like:

                   Recent events have thrown the Party into disarray,

where the reference is to a political party, not a social gathering. This is
an example of a fairly common kind of error in doing CPA, no doubt due to an
excessively zealous desire to achieve quantity and speed. More haste, less
speed! This means that, while most CPA patterns are well supported by corpus
evidence, there is a need for systematic validation before they can be
released as a gold standard for NLP applications.

One thing that computational linguists might infer from this is that
unambiguous identification of a meaning often depends crucially on *
correlation* of arguments within a pattern -- what Ken Church and I 20 years
ago called "triangulation".

There are of course some cases where the semantic type of the direct object
alone is sufficient, e.g.:

execute a person vs. execute an order.

but this is true only of a minority of verbs.  Patterns must be taken as a
whole.  Thematic roles are all very well in their way, but they don't get us
very far down the road towards message understanding. And it is a source of
confusion to call them "semantic roles".  There is a huge difference between
A) thematic roles such as "Agent", "Patient", Beneficiary", "Instrument" and
B) semantic types such as [[Human]], [[Physical Object]], [[Weapon]].
Semantic types denote intrinsic semantic properties.  Thematic roles don't.

Comments/feedback welcome.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110721/1bdaad17/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list