<DIV>Dear Hardie,</DIV> <DIV> </DIV> <DIV>Thanks for your e-mail having valuable suggestions for me. I'll indeed act on your advice to enhance the corpus. Well, I have been working with Xaira for a few days, and I have found that a very useful tool. </DIV> <DIV> </DIV> <DIV>Well Sir, I would like to ask, what were the factors due to which you preferred the use of e.g. SQL for larger corpora i.e. in case of Urdu, Nepali etc? What do you say, isn't XML better for larger corpora? If not, then why Sir? </DIV> <DIV> </DIV> <DIV>Regards.</DIV> <DIV> </DIV> <DIV><BR><BR><BR><B><I>"Hardie, Andrew" <a.hardie@lancaster.ac.uk></I></B> wrote:</DIV> <BLOCKQUOTE class=replbq style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #1010ff 2px solid"> <META content="MSHTML 6.00.6000.16608" name=GENERATOR> <DIV><SPAN class=397591607-21042008><FONT face=Arial color=#0000ff size=2>Dear Fatima,</FONT></SPAN></DIV> <DIV><SPAN
class=397591607-21042008><FONT face=Arial color=#0000ff size=2></FONT></SPAN> </DIV> <DIV><SPAN class=397591607-21042008><FONT face=Arial color=#0000ff size=2>I am sure others will have responded to your queries, but I thought I'd add my voice. For the kind of data you describe, Xaira is indeed a good option. the web addresses you need are:</FONT></SPAN></DIV> <DIV><SPAN class=397591607-21042008><FONT face=Arial color=#0000ff size=2></FONT></SPAN> </DIV> <DIV><SPAN class=397591607-21042008><FONT face=Arial color=#0000ff size=2><A href="http://www.oucs.ox.ac.uk/rts/xaira/">http://www.oucs.ox.ac.uk/rts/xaira/</A><BR><A href="http://www.natcorp.ox.ac.uk/tools/">http://www.natcorp.ox.ac.uk/tools/</A><BR><A href="http://sourceforge.net/projects/xaira/">http://sourceforge.net/projects/xaira/</A><BR><A href="http://xaira.sourceforge.net/">http://xaira.sourceforge.net/</A></FONT></SPAN></DIV> <DIV><SPAN class=397591607-21042008><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV> <DIV><SPAN class=397591607-21042008><FONT face=Arial color=#0000ff size=2>However, when you have a larger corpus, you might also consider whether a web-accessible solution (e.g. one based on an SQL database) would be more convenient. I have found this to be the case when working with corpora of Urdu, Nepali, Sinhala etc.</FONT></SPAN></DIV> <DIV><SPAN class=397591607-21042008><FONT face=Arial color=#0000ff size=2></FONT></SPAN> </DIV> <DIV><SPAN class=397591607-21042008><FONT face=Arial color=#0000ff size=2>In terms of your future research, I would recommend working primarily on expanding your corpus. 30,000 words is not a lot of data in corpus terms. You will find, I think, that effort spent enhancing your corpus collection will be much more fruitful than developing software, especially given how much ready-made corpuys analysis software is freely available.</FONT></SPAN></DIV> <DIV><SPAN class=397591607-21042008><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV> <DIV><SPAN class=397591607-21042008><FONT face=Arial color=#0000ff size=2>best regards,</FONT></SPAN></DIV> <DIV><SPAN class=397591607-21042008><FONT face=Arial color=#0000ff size=2></FONT></SPAN> </DIV> <DIV><SPAN class=397591607-21042008><FONT face=Arial color=#0000ff size=2>Andrew Hardie.</FONT></SPAN></DIV> <DIV><SPAN class=397591607-21042008><FONT face=Arial color=#0000ff size=2></FONT></SPAN> </DIV> <DIV><SPAN class=397591607-21042008> <DIV align=left> <DIV><FONT face=Arial size=2><SPAN class=300082312-04052006> <DIV dir=ltr align=left><FONT face=Verdana color=#800080 size=2><EM>Andrew Hardie</EM></FONT></DIV> <DIV dir=ltr align=left><FONT face=Verdana color=#800080 size=2><EM>Linguistics<SPAN class=958515708-22052007> & English Language</SPAN></EM></FONT></DIV> <DIV dir=ltr align=left><FONT face=Verdana color=#800080 size=2><EM>Bowland College</EM></FONT></DIV> <DIV dir=ltr align=left><FONT
face=Verdana color=#800080 size=2><EM>Lancaster University</EM></FONT></DIV> <DIV dir=ltr align=left><FONT face=Verdana color=#800080 size=2><EM>Lancaster LA1 4YT</EM></FONT></DIV> <DIV dir=ltr align=left><FONT face=Verdana color=#800080 size=2><EM>United Kingdom</EM></FONT></DIV> <DIV dir=ltr align=left><FONT face=Verdana size=2><EM></EM></FONT> </DIV> <DIV dir=ltr align=left><FONT face=Verdana size=2><SPAN class=343085411-09102007><FONT face=Verdana size=2><EM><A href="http://www.ling.lancs.ac.uk/staff/hardie">www.ling.lancs.ac.uk/staff/hardie</A></EM></FONT></SPAN></FONT></DIV></SPAN></FONT></DIV></DIV></SPAN></DIV><BR> <DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left> <HR tabIndex=-1> <FONT face=Tahoma size=2><B>From:</B> corpora-bounces@uib.no [mailto:corpora-bounces@uib.no] <B>On Behalf Of </B>fatima zuhra<BR><B>Sent:</B> 19 April 2008 03:25<BR><B>To:</B> Corpora@uib.no<BR><B>Subject:</B> [Corpora-List] Corpus Development<BR></FONT><BR></DIV>
<DIV></DIV> <DIV>Hi All,</DIV> <DIV> </DIV> <DIV>Thanks a lot to all, who paid attention to my message and provided me with their valuable suggestions.</DIV> <DIV> </DIV> <DIV>Dear Laxmi, my corpus is a general-purpose corpus of written Pashto. Dear Mr. Adam, the corpus currently contains 30,000 words and its size is increasing. I haven't used Xiara, but am interested in using it. Dear Lou, I'll be too much thankful to you if you help me further by forwarding me some guidelines about Xiara. The web page <A href="http://www.xaira.net/">http://www.xaira.net/</A> cannot be displayed in my browser. </DIV> <DIV> </DIV> <DIV>Dear Gee Raza, I am also glad to see someone from Pakistan on the list. Well, I only know the three languages, you have mentioned, but am interested in learning Arabic and Persian. I hope I'll soon learn these two.</DIV> <DIV> </DIV> <DIV>Dear Oliver, I meant to ask that am I going in a right
direction for a general-purpose Pashto corpus? By fully functional, I mean something that can be rightly called a corpus. I also wanted to investigate the appropriate statistical measures, which can be used for the evaluation of any newly developed software. In our country, there are statisticians, who know each and every statistical measure, but cannot guide us which one to use for which purpose. If there are some, who can guide, we do not have access to them.</DIV> <DIV> </DIV> <DIV>Thanks to Sir Ramesh for his encouragement and valuable suggestions.</DIV> <DIV> </DIV> <DIV>I have also developed a finite state morphological analyzer for Pashto. I will provide the details from time to time. </DIV> <DIV> </DIV> <DIV>Regards.</DIV> <div> <HR SIZE=1> Be a better friend, newshound, and know-it-all with Yahoo! Mobile. <A href="http://us.rd.yahoo.com/evt=51733/*http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ">Try it now.</A>
</BLOCKQUOTE><BR><p>
<hr size=1>Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. <a href="http://us.rd.yahoo.com/evt=51733/*http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ "> Try it now.</a>