[Corpora-List] CFP: FIRST WORKSHOP+TUTORIAL ON LANGUAGE TECHNOLOGIES FOR INDIAN SOCIAL MEDIA ( सOCIAL-ईNDIA)

Amitava Das amitava.santu at gmail.com
Sat Oct 11 08:03:04 UTC 2014


CALL FOR PAPERS
=================

FIRST WORKSHOP ON LANGUAGE TECHNOLOGIES FOR INDIAN SOCIAL MEDIA
(सOCIAL-ईNDIA)
===============================================================================

RATIONALE
==========
The evolution of social media texts – such as blogs, micro-blogs (e.g.,
Twitter), and chats (e.g., Facebook messages) – has created many new
opportunities for information access and language technology, but also many
new challenges, making it one of the prime present-day research areas.
Automatic processing of these types of texts warrants new strategies, in
particular since they often are very ‘noisy’, that is, they are
characterised by having a high percentage of spelling errors and containing
creative spellings (gr8 for ‘great’), word play (goooood for ‘good’),
abbreviations (OMG for ‘Oh my God!’), Meta tags (URLs, Hashtags), and so
forth. So far, most of the research on social media texts has concentrated
on English, whereas most of these texts now are in non-English languages.
In social media, non-English speakers do not always use Unicode to write in
their own language, they use phonetic typing, frequently insert English
elements (through code-mixing and Anglicisms. See the following example 1),
and often mix multiple languages to express their thoughts, making
automatic language processing of social media texts a very challenging
task. Thus it is clear that even though English still is the principal
language for web communication, there is a growing need to develop
technologies for other languages. Here we will concentrate on social media
text in Indian languages, a nation with more than 20 official languages.
ICON is a well-established gathering for the industrial and academic
research communities both internationally and in India. Therefore, we
believe that it is the best place to bring research attention towards
developing language technologies for Indian social media text. The workshop
will hold an embedded tutorial on code-mixing in social media.  The three
primary goals of the proposed workshop are:

1. To focus community awareness on language technologies for Indian social
media.
2. Sharing of corpora and resources to promote future research.
3. Exchange of ideas and experiences amongst researchers.

Example 1. ICON isbar Goa mein ho raha hai! Great chance to visit Goa!

EMBEDDED TUTORIAL ON CODE-MIXING IN SOCIAL MEDIA
================================================
Abstract: Code-mixing, or mixing of more than one language in a single
conversation or utterance is a common phenomenon in any multilingual
society. Extreme multilinguality of India makes code-mixing extremely
common on social media content posted in Indian languages and by Indian
users. In this tutorial, we will talk about why code-mixing is, on one hand
a computational challenge that must be solved to effectively process IL
content, and on the other hand, a wonderful linguistic resource for
studying several allied phenomena. The tutorial will also introduce some
basic NLP techniques for code-mixed data.
DURATION: Half Day

TUTORIAL COORDINATORS
======================
Monojit Choudhury
Microsoft Research Lab India
Website: http://research.microsoft.com/en-us/people/monojitc/

Kalika Bali
Microsoft Research Lab India
Website: http://research.microsoft.com/en-us/people/kalikab/

LIST OF TOPICS
===============
We welcome original and unpublished submissions on all aspects of language
technologies for Indian languages in the social media context. Topics of
interest include, but are not limited to:
• Part of Speech (POS) Tagging
• Language Detection
• Morphological Analysis
• Name Entity Recognition (NER)
• Dependency Parsing
• Lexical Resources
• Annotated corpora
• Transliteration
• Sentiment Analysis

WORKSHOP ORGANIZERS
====================
Amitava Das
University of North Texas, USA
Website: http://amitavadas.com/

Björn Gambäck
Norwegian University of Science and Technology, Trondheim, Norway
Website: http://www.ntnu.edu/employees/gamback

Dipankar Das
Jadavpur University
Website: http://www.dasdipankar.com/

Thanks,
Amitava
----------------------------------------------------------------
*Dr. AMITAVA DAS*
Research Scientist
Department of Computer Science and Engineering
University of North Texas, Denton, Texas, USA
Phone: +1 940 442 7560
Web Page: http://www.amitavadas.com/
----------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20141011/72ff19dd/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list