request for suggestions/direction for Masters project

Andrew Freeman andyf at UMICH.EDU
Sun Oct 19 20:58:10 UTC 2008


Hi,

 

                I'm in my 2nd year of the Professional Masters in
Computational Linguistics at the University of Washington here in Seattle.
I'm attending classes at 2/3 time and will probably be working on a Master's
thesis/project/internship by July 2009.

 

                I am floundering on picking a project.  

 

                So, I'm hoping that maybe the community of Arabic Linguists
might be able to give me some direction on what I could work on that would
be most useful and well-received.

 

I, of course, have my own pet projects which include:    

1)      Developing a tool for and then automatically annotating my corpus of
Yemeni TV & radio shows with POS-tags and eventually Syntax tags

a.       Automatically identify/extract the segments that are in
San'aani/Dhamari

b.      Automatically identify/extract the segments that are in the southern
oriented dialects (Adeni & Ta'izzi)

c.       Say something empirically informed about the structure of the
various varieties

d.      Say something smart about the social distribution and meaning or
lack thereof

e.      Use this as a seed for a machine-learning tool to learning to
identify different varieties in other mixed lect corpora

 
i.      Chat rooms

 
ii.      Moroccan TV, radio and recording of "authentic" speech

 
iii.      Code-switched Spanish-English; English-Swahili

 

2)      Annotating some literary texts 

a.       from the Levant

b.      from the Maghreb

c.       See if I can identify features that uniquely identify the
difference between the categories

 
i.      (I suspect that the incidence & complexity of the Idaafa
construction might be one)

d.      Say something empirically informed about the structure of literary
Arabic as a function of the writer's native vernacular variety

 

3)      Developing some automatic annotation tools that can be used to
produce Rich Internet Application documents so that an Intermediate mid to
Advanced Low Standard Arabic learner can use "bio-feedback" loops to learn
and self-test for the following skills.

a.       Identifying the dictionary stem and maybe the root

b.      Recovering the short-vowels

c.       Segmenting the words into the constituent parts

 
i.      Nouns

1.       Optional Conjunction

2.       Optional Preposition

3.       Optional article

4.       Stem

5.       Optional dual or plural

6.       Possessive pronouns

 
ii.      Imperfect Verbs

1.       Optional Conjunction

2.       Optional modal marker

3.       Subject pronoun marker

4.       Stem

5.       Optional feminine, dual or plural marker

6.       Optional object pronouns

 
iii.      Perfect Verbs

1.       Optional Conjunction

2.       Stem

3.       Subject pronoun marker

4.       Optional object pronouns

 
iv.      Prepositions

1.       Optional Conjunction

2.       Stem

3.       Optional object pronouns

d.      Identifying the word gloss 

e.      As this project develops maybe even introduce 

 
i.      some Automatic Speech Recognition for pronunciation training

 
ii.      mood & case vowel recovery

f.         

 

4)      Develop a reasonable search engine for Arabic (and by analogy
Hebrew) that will try to locate all instances of a stem regardless of all of
the attached affixes.

a.       Currently Google returns a different set of documents for kTAb than
it does for AlktAb.

 

5)      Work on ways to improve  performance of Statistically based Machine
Translation between Arabic and various source-target by improving word
alignment with word-segmentation segmentation and vowel recovery.

 

 

                So: Does anybody have any suggestions on what I could do in
about 480 staff-hours? 

 

Here is my motivating force:

I am trying to find a niche where: 

1)      I can be a software developer, without giving up on my experience
and training as a linguist and an Arabist. 

1)      I can access my training & make a contribution as a sociolinguist
without completely giving up on being a SW developer or Arabist

2)      I can continue to make a contribution as an Anglophone with serious
knowledge of Arabic and Arab culture without abandoning any and all roles as
a SW developer or Linguist.

 

                

Best regards,

Andy

 

Andrew Freeman, PhD (Linguistics & Near Eastern Studies)

BS Computer Science

 

PS

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/arabic-l/attachments/20081019/4c13fcea/attachment.htm>


More information about the Arabic-l mailing list