[Corpora-List] Proposal for initiating a global non continentalArabic Language Academy

Linas Vepstas linasvepstas at gmail.com
Tue Nov 17 03:09:54 UTC 2009


2009/11/16 Linas Vepstas <linasvepstas at gmail.com>:
> Hi,
>
> 2009/11/16 El-Haj, Mahmoud <melhaj at essex.ac.uk>:
>> Dear Hamed,
>>
>>  Arabic shares characteristics with other living languages, which could be very helpful as Arabic NLP still in it's preliminary phases comparing to others.
>
> There may be more to this than it seems. I notice that
> Regina Barzilay/MIT CSAIL: is giving seminars entitled
>  "Embracing Language Diversity: Unsupervised Multilingual
> Learning?"  wherein I think she tries to establish the idea that
> unsupervised training of parsers works even better if you
> train them on multiple languages, simultaneously. (or something
> like that, I haven't seen the lectures yet.)
>
> The upshot is that I think that it would be a mistake to pretend
> that Arabic NLP is an island, needing only short-term help until
> it catches up with other languages. More likely, one has to
> consider in context, and fully embrace "comparative linguistics",
> in both its traditional, and now,, maybe more modern senses.

Specifically, the below, which presumably will mention ties between
ancient languages and modern Arabic.  "I" is Regina.

Talk Title: "Embracing Language Diversity: Unsupervised Multilingual Learning?"

Talk Abstract:
For centuries, the deep connection between human languages has
fascinated scholars, and driven many important discoveries in
linguistics and anthropology. In this talk, I will show that this
connection can empower unsupervised methods for language analysis. The
key insight is that joint learning from several languages reduces
uncertainty about the linguistic structure of each individual
language.

I will present multilingual generative unsupervised models for
morphological segmentation, part-of-speech tagging, and parsing. In
all of these instances we model the multilingual data as arising
through a combination of language-independent and language-specific
probabilistic processes. This feature allows the model to identify and
learn from recurring cross-lingual patterns to improve prediction
accuracy in each language. I will also discuss ongoing work on
unsupervised decoding of ancient Ugaritic tablets using data from
related Semitic languages.

This is joint work with Benjamin Snyder, Tahira Naseem and Jacob Eisenstein.

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list