[Corpora-List] Language complexity for textual processing

Taras Zagibalov taras8055 at gmail.com
Thu Jan 26 10:04:30 UTC 2012


Hello

I wonder if anyone knows a research on language complexity evaluation
regarding textual processing? Intuitively, I can, for example, assume
that English is easier for text processing than French because the
latter is more inflected than English which would require more complex
lemmatisation. German is probably more complex than French because of
"word-chaining" on top of inflection. Chinese is much easier because
of lack of infection but absence of word delimiters makes this
language difficult for traditional "word-based" processing (please
note, that I mean text processing thus ignoring complex tonal
phonetics of the language). Russian and many other Slavic languages
are difficult due to morphology and free word order, Arabic is
difficult due to variety of regional dialects and syllabic-consonant
writing system. Hebrew should be similar to Arabic, 'minus' regional
diversity.

Has anyone tried to rank/group these language according to the amount
of labour required to produce a NLP system for these languages? I do
not mean availability of already developed tools but rather developing
'from scratch'?

Thanks a lot.

Taras

PS I am aware of existing language complexity ranking but it is
developed in regard of second language acquisition which involves
phonetics.

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list