[Corpora-List] Developing a Speech Corpus

True Friend true.friend2004 at gmail.com
Sat Oct 15 14:30:30 UTC 2011


Aslam-o-Alikum Fatima
We are developing a corpus for Pakistani Spoken English at Govt. College
University, Faisalabad for last 3 years. Actually this is part of collecting
Pakistani component of International Corpus of English. Although it is
English data, but I believe certain problems and experiences would be common
to both languages in Pakistani setting. In case of English 80% of data comes
from media i.e. news channels, talk shows, radios, commentary, speeches.
This is the data which is publicly available, or broadcasted. So there are
no ethical issues involved at least, although copyright issues might be
there, but we have maintained a proper referencing system. The remaining 20%
comes from classrooms, a few telephonic conversations etc. of which prior
permission has been taken in most of the cases from the speakers involved
(usually the data collectors are part of the conversation they are recording
and teachers or other participants know about data collection and allow it).
Regarding the annotation or tagging scheme for spoken data (e.g. marking
repetitions) guidelines from ICE website are available for markup. In case
of Pushto you might start with media related genres and then include
conversations, lectures and other non media related settings. And for the
annotation, ICE scheme or a modified version might be used.
Hopefully it would help.
Regards


-- 
*Muhammad Shakir Aziz* *محمد شاکر عزیز*
*Masters in Applied Linguistics
Translator, Course Developer, Linguist for Urdu, Punjabi and English*
Urdu:- http://awaz-e-dost.blogspot.com/
English:- http://linguisticslearner.blogspot.com/
Facebook:- http://www.facebook.com/truefriend2004
Skype:- true_friend2004
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20111015/9752ac00/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list