[Corpora-List] Uses of N-grams?

Kilian Evang maschinenraum at texttheater.net
Thu Jul 18 08:35:26 UTC 2013


Dear Cedric,

n-grams are also widely used in computational linguistics to process
natural language automatically, based on statistical information.

A simple example is a language model: given the beginning of a sentence,
which word is likely to appear next? Counting n-grams in a large corpus
can help predict this. To some extent, this can be used for fluency
ranking, i.e. automatically assessing how "natural" a text sounds that
is produced e.g. by a language learner or by a machine translation system.

Another example is part-of-speech tagging: the word "cap" can be either
a verb or a noun, but the context should disambiguate it. The n-grams as
part of which the word appears may provide such context.

Best,
Kilian


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list