[Corpora-List] Frequency of grammatical constructions or

Martin Reynaert reynaert at uvt.nl
Wed Jul 8 08:47:25 UTC 2009


Hi Linas,

Nice graph!

 From what I have learned from the work of mainly Ramon Ferrer i Cancho 
(http://www.lsi.upc.edu/~rferrericancho/publications_by_year.html), I 
would say that adding syntactic patterns to the words turns the natural 
language into more formal language. For more formal language a power law 
exponent well above 1 is `natural'.

Greetings,

Martin Reynaert
ILK


Linas Vepstas wrote:
> I recently made a curious graph of the frequency of grammatical
> constructions in English, and am fishing for an explanation of its shape.
>
> I'm using a parser (link-grammar) which allows me to attach to
> every word of a sentence a pattern (a "disjunct") that defines how
> that word was used in the sentence. One can think of the disjunct
> as being a very fine-grained part of speech: for example, it
> distinguishes not only transitive and intransitive verbs, but transitive
> verbs from ditransitive ones, or those that took particles, or even
> had singular vs. plural objects, etc. The disjunct precisely captures the
> syntactical usage of a given word in a given sentence.
>
> The attached graph shows rank versus frequency of usage, taken
> from a corpus of about 1M sentences from Wikipedia articles.
> There is a nice long tail, showing a Zipfian power-law distribution,
> with exponent 1.5. There is also a knee at the highest ranks: the
> most frequent disjuncts are less frequent than they "should be" for
> a pure Zipfian distribution.
>
> The questions are then:
> 1) Why a power law of 1.5?
> 2) Why is there a knee?
> 3) What about other languages?
>
> I blogged this in slightly more detail at:
>
> http://opencog.wordpress.com/2009/07/06/frequency-of-grammatical-disjuncts/
>
> --linas
>   
>
> ------------------------------------------------------------------------
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>   


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list