[Corpora-List] Readability statistics

John F. Sowa sowa at bestweb.net
Mon Feb 11 06:57:36 UTC 2008


Serge and Steve,

Many different kinds of constructions involve closed class words.
Some might be easier and some harder to process by a native speaker,
a learner, or a computer.  And learners whose native languages have
very different structures might have different degrees of difficulty.

SS> The only point I disagree with is:

SF>> Sentences of the same length containing more closed class
 >> words are likely to be easier to process.

For example, the word 'that' is optional in the following sentences:

    This is the house [that] Jack built.

    Tom believes [that] the moon is made of green cheese.

Including the word 'that' in such sentences increases the number of
closed class words, but it can sometimes speed up the parsing.
Without 'that', a parser might interpret 'the moon' as the direct
object of 'believes' and switch to a different interpretation when
it finds the word 'is'.

Another example might have long chains of noun-noun modifiers,
which might be easier to process if some prepositions were added
to break up the chains.

There are many examples that have different levels of difficulty for
humans and machines.  The word 'antidisestablishmentarianism' is a
long word that might confuse a human reader, but a computer would
immediately recognize it as a noun that has exactly one word sense.

John Sowa


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list