<div dir="ltr"><font size="4"><font face="tahoma,sans-serif">Aslam-o-Alikum Fatima<br>We
are developing a corpus for Pakistani Spoken English at Govt. College
University, Faisalabad for last 3 years. Actually this is part of
collecting Pakistani component of International Corpus of English.
Although it is English data, but I believe certain problems and
experiences would be common to both languages in Pakistani setting. In
case of English 80% of data comes from media i.e. news channels, talk
shows, radios, commentary, speeches. This is the data which is publicly
available, or broadcasted. So there are no ethical issues involved at
least, although copyright issues might be there, but we have maintained a
proper referencing system. The remaining 20% comes from classrooms, a
few telephonic conversations etc. of which prior permission has been
taken in most of the cases from the speakers involved (usually the data
collectors are part of the conversation they are recording and teachers
or other participants know about data collection and allow it).
Regarding the annotation or tagging scheme for spoken data (e.g. marking
repetitions) guidelines from ICE website are available for markup. In
case of Pushto you might start with media related genres and then
include conversations, lectures and other non media related settings.
And for the annotation, ICE scheme or a modified version might be used. <br>
Hopefully it would help.<br>Regards<br></font></font><font color="#888888"><br clear="all"><br>-- <br><div dir="ltr"><font style="color:rgb(51, 102, 255)" size="4"><b>Muhammad Shakir Aziz</b></font><span style="color:rgb(51, 102, 255)"> </span><font style="color:rgb(51, 102, 255)" size="4"><b><span style="font-family:tahoma,sans-serif">محمد شاکر عزیز</span></b></font><br style="color:rgb(51, 102, 255)">
<b><span style="color:rgb(51, 102, 255);font-family:comic sans ms,sans-serif">Masters in Applied Linguistics</span><br style="color:rgb(51, 102, 255);font-family:comic sans ms,sans-serif"><span style="color:rgb(51, 102, 255);font-family:comic sans ms,sans-serif">Translator, Course Developer, Linguist for Urdu, Punjabi and English</span></b><br style="color:rgb(51, 102, 255)">
<span style="color:rgb(51, 102, 255);font-family:courier new,monospace">Urdu:- </span><a style="color:rgb(51, 102, 255);font-family:courier new,monospace" href="http://awaz-e-dost.blogspot.com/" target="_blank">http://awaz-e-dost.blogspot.com/</a><br style="color:rgb(51, 102, 255);font-family:courier new,monospace">
<span style="color:rgb(51, 102, 255);font-family:courier new,monospace">English:- </span><a style="color:rgb(51, 102, 255);font-family:courier new,monospace" href="http://linguisticslearner.blogspot.com/" target="_blank">http://linguisticslearner.blogspot.com/</a><br style="color:rgb(51, 102, 255);font-family:courier new,monospace">
<span style="color:rgb(51, 102, 255);font-family:courier new,monospace">Facebook:- </span><a style="color:rgb(51, 102, 255);font-family:courier new,monospace" href="http://www.facebook.com/truefriend2004" target="_blank">http://www.facebook.com/truefriend2004</a><br style="color:rgb(51, 102, 255);font-family:courier new,monospace">
<span style="color:rgb(51, 102, 255);font-family:courier new,monospace">Skype:- true_friend2004</span></div></font></div>