[Corpora-List] Variant verbal government extraction

Kopotev Mihail.Kopotev at Helsinki.fi
Fri Feb 23 14:08:00 UTC 2007


Thank you, Adam.
That's the way we were thinking about. But the problem seems to be more 
complicated.

First, let's keep in mind that we're working with a language that has 
reach noun morphology. So, the right periphery of a verb can be 
presented with both prepositional phrases and noun phrases (where more 
than one case is possible).

Let’s give me an example, explaining this statement.

In Russian one can say:

strelyat' po utkam / utok/ v utok,

or literary in English:

to shoot at ducks / ducks / into ducks ‘to shoot at ducks’

So, even if we have got a list of verbs, we can hardly search all 
possible prepositional phrases and noun phases, where the same noun 
(‘duck’ in our example) can be presented in all variety of ways (the 
accusative and two PPs in our example).

In other words, using the algorithm you suggested

… Find how often it occurs in pattern <VERB PRONOUN>

Find how often it occurs in pattern <VERB to PRONOUN> …

we will have to check all nouns and pronouns in all cases, as well as 
all possible PPs in position of the PRONOUN.

The goal that takes a lot of time to accomplish. Can there be any other 
way to put together all these verbs?

Thanks,

MK

Mikhail Kopotev
Researcher
Department of Slavonic
and Baltic Languages and Literatures
University of Helsinki



Adam Kilgarriff :
>
> Mikhail,
>
> The algorithm you want is
>
> In a large corpus
>
> For each verb
>
> Find how often it occurs in pattern <VERB PRONOUN>
>
> Find how often it occurs in pattern <VERB to PRONOUN>
>
> Compute a statistic to see how high both these numbers are, relative 
> to overall freq of verb
>
> Sort verbs according to the statistic
>
> Now you have a starter set for examining which verbs show the 
> behaviour you want to investigate.
>
> All relevant frequencies are available for, eg, the BNC, in the Sketch 
> Engine http://www.sketchengine.co.uk <http://www.sketchengine.co.uk/> 
> where you can define the patterns in CQL (Corpus Query Language from 
> Stuttgart Uni). We don’t currently have a nice web interface for 
> robots but will have shortly, in the meantime, ask us and we can set 
> things up to help you (eg by allowing you robot access and then you’d 
> need to scrape web pages)
>
> Regards,
>
> Adam
>
> -----Original Message-----
> *From:* owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] 
> *On Behalf Of *Mikhail Kopotev
> *Sent:* 22 February 2007 13:15
> *Cc:* CORPORA at UIB.NO
> *Subject:* [Corpora-List] Variant verbal government extraction
>
> Dear all,
>
> does anyone know how to recognize and extract variations of verbal 
> government such as “to write you/to you’ from a corpus?
>
> As far as I am interested in Russian morphosyntactic changes, I would 
> like you to point me any tools, methods rather than obtained results, 
> concerning English or any other relevant ;) languages.
>
> Many thanks,
>
> Mikhail Kopotev
> Researcher
> Department of Slavonic
> and Baltic Languages and Literatures
> University of Helsinki



More information about the Corpora mailing list