<div>Hi All,</div> <div> </div> <div>I am a student of MS at Peshawar University. I have developmed a Pashto corpus as a part of my MS research project. My corpus contains written Pashto data and it is XML tagged. The corps consists of three cells: one containing Pashto novels, another contains Pashto essays' data and the other having Pashto letters' data. All the data is tagged upto sentence level. I have developed a user interface for the corpus, which takes a word or phrase from the user (that the user wants to search in the corpus), searches it into the corpus, and displays all the sentences from the corpus, containing the query word or phrase. I have a couple of questions:</div> <div> </div> <div>1. Am I going in the right direction of the corpus development?</div> <div>2. What can further be done, in order to convert the corpus to fully functional corpus?</div> <div>3. What are the statistical measures that I can use to measure
the accuracy of this corpus?</div> <div> </div> <div>I'll be very thankful to the provider of any helpful suggestions in this regard.</div> <div> </div> <div>Thanks. </div><p>
<hr size=1>Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. <a href="http://us.rd.yahoo.com/evt=51733/*http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ "> Try it now.</a>