22.5043, FYI: Crowdsourcing the Development of Underserved Langs

linguist at LINGUISTLIST.ORG linguist at LINGUISTLIST.ORG
Wed Dec 14 16:20:10 UTC 2011


LINGUIST List: Vol-22-5043. Wed Dec 14 2011. ISSN: 1069 - 4875.

Subject: 22.5043, FYI: Crowdsourcing the Development of Underserved Langs

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews: Veronika Drake, U of Wisconsin-Madison
Monica Macaulay, U of Wisconsin-Madison
Rajiv Rao, U of Wisconsin-Madison
Joseph Salmons, U of Wisconsin-Madison
Anja Wanner, U of Wisconsin-Madison
       <reviews at linguistlist.org>

Homepage: http://linguistlist.org

The LINGUIST List is funded by Eastern Michigan University,
and donations from subscribers and publishers.

Editor for this issue: Brent Miller <brent at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.cfm.

===========================Directory==============================  

1)
Date: 13-Dec-2011
From: Mark Mandel [mamandel at ldc.upenn.edu]
Subject: Crowdsourcing the Development of Underserved Langs


-------------------------Message 1 ---------------------------------- 
Date: Wed, 14 Dec 2011 11:20:00
From: Mark Mandel [mamandel at ldc.upenn.edu]
Subject: Crowdsourcing the Development of Underserved Langs

E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=22-5043.html&submissionid=4537687&topicid=6&msgnumber=1
 
(I am not connected with this project; please do not contact me about it. 
-- M. Mandel)

Crowdsourcing the Development of Underserved Language Resources
(http://www.rhok.org/problems/crowdsourcing-development-
underserved-language-resources)

The provision of affordable, accessible and sustainable data, tools and 
technologies in local languages is necessary for developing world 
populations across the globe to allow them access to the knowledge 
society and economy, to both consume and to generate relevant 
content. This includes access to appropriate networks and Information 
and Communication Technologies (ICTs) supported by adequate 
Human Language Technologies (HLT). There is an urgent need to 
realize the fundamental rights of the citizens of the world to have 
access to information in their language, information that will allow them 
to improve their economic situation, their education, their legal rights, 
and their health. A major challenge that still faces the development of a 
truly inclusive and diverse global information society is the extreme 
scarcity of language resources that can be utilized by researchers and 
practitioners to build human language technologies (HLT) for countries 
in the developing world. Unless resolved, this issue will prevent the 
vast majority of the next billions of the world's citizens, who rely 
exclusively on their native languages to consume and produce 
information, from participating in the global information society.
 
This project aims at tackling this challenge by leveraging open content, 
mobile technologies and crowd-sourcing to create language resources 
for the underserved world languages and make them available under 
open licenses to stimulate research and development in the area of 
Human Language Technologies (HLT). The project will use existing 
open text repositories (such as Wikipedia) in language such as Swahili, 
Arabic and Urdu, and will create a crowd-sourcing mechanism for 
developing these text repositories into language corpora. This could 
include, for example, tagging the words in the corpus based on part of 
speech (a process known as Part of Speech Tagging). For this 
purpose, a platform can be built to extract sentences from the corpus 
and send it to a group of contributors through text messages. Each 
contributor can examine the sentence and determine the tag for each 
word in the sentence (verb, noun, adjective, etc.) and send it back to 
the platform. Redundant responses from several contributors will be 
used to ensure the correctness of the answers and to flag any potential 
errors. Participation in the platform can be encouraged through several 
means. For example, contributors may be rewarded for their 
participation with mobile credit they can use on their phones, or a 
badge system could be applied to acknowledge active contributors. 
The participation process can also be possibly structured around a 
game-like style. 



Linguistic Field(s): Computational Linguistics
                     Text/Corpus Linguistics





 





-----------------------------------------------------------
LINGUIST List: Vol-22-5043	
----------------------------------------------------------



More information about the LINGUIST mailing list