[Corpora-List] Why some languages has complex morphology meanwhile other not?
Majid Laali
mjlaali at gmail.com
Mon Dec 12 16:45:45 UTC 2011
Thank you for your replays, however, I think it is better to shed a light to my question:
I formally define complex morphology as the role of clitics in a language, for example see this(1):
"Similarly, the following list, excerpted from Hakkani-Tu ̈ r et al. (2002), shows a few of the words producible in Turkish from the root uyu-, ’sleep’:
uyuyorum ‘I am sleeping’
uyuduk ‘we slept’
uyuman ‘your sleeping’
uyutmak ‘to cause someone to sleep’"
As you can see, clitics in Turkish play more syntactic and semantic roles than English. So in such language we found more inflectional surface form from a word. This properties arise many different problem in different NLP tasks such as POS tagging, SMT, IR, and etc. Maybe the most dominant problem is you need more data (text) in learning stage for any statistical approaches for these tasks.
Another plus point that should be taken in consideration is that, I want to know can we group different language to some types by features like what I mentioned, so that morphology analysis of them is similar to each other?
Although, the complexity of a language and where language put this complexity is very interesting, I think it needs another post to disgust.
Regards,
Majid Laali,
Natural Language and text Processing Laboratory(http://ece.ut.ac.ir/NLP),
School of Electrical and Computer Engineering,
College of Engineering, University of Tehran, Tehran, Iran
m.laali at ut.ac.ir
(1) example is from "Speech and Language Processing An introduction to natural language processing, computational linguistics, and speech recognition" book
On Dec 12, 2011, at 3:36 PM, Graham White wrote:
> I suspect a lot of it is simply random drift: languages have to put their complexity somewhere, but there is a lot of choice as to where they put it. French, for example, has lost the noun inflections which
> Latin has, but it's acquired a complex system of clitics, which
> Latin doesn't have. And even English, though it's not as morphologically complex as its predecessors, has a very complex
> tense and aspect system (which non-native speakers seem to find
> very hard to acquire). People tend to notice morphological complexity
> because it's fairly visible, but there are many other ways of being
> complex which aren't so obvious at first sight.
>
> Graham
>
> On 12/12/11 11:27, Grzegorz Chrupała wrote:
>> Dear Majid,
>>
>> On Mon, Dec 12, 2011 at 13:52, Majid Laali<mjlaali at gmail.com> wrote:
>>> Dear Corpora List,
>>>
>>> I am working on developing an stemmer/lemmatization system for Persian.
>>> However, I am curious to know why some languages like Persian, Turkish,
>>> Chinese have complex morphology system, meanwhile other languages like
>>> English have much more simpler morphology system.
>>
>> Actually Chinese has virtually no morphology. Persian morphology is
>> also relatively simple compared to many other languages (e.g. Slavic).
>>
>>> In other hand, is there
>>> any criteria caused such difference like their historical change, their
>>> lexicon properties, or their type (Indo-European, or more specific type like
>>> Romance)?
>>>
>>
>> There is usually a trade-off between complexity in the morphology and
>> complexity in the syntax. Regarding historical origins, one factor
>> causing a simplification of morphology seems to be creolization. But
>> of course it is only one of many factors.
>>
>> Best,
>> --
>> Grzegorz
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20111212/c0ea132f/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list