[Corpora-List] Corpus Development

fatima zuhra fateeshah at yahoo.com
Thu Apr 17 13:48:07 UTC 2008


Hi All,
   
  I am a student of MS at Peshawar University. I have developmed a Pashto corpus as a part of my MS research project. My corpus contains written Pashto data and it is XML tagged. The corps consists of three cells: one containing Pashto novels, another contains Pashto essays' data and the other having Pashto letters' data. All the data is tagged upto sentence level. I have developed a user interface for the corpus, which takes a word or phrase from the user (that the user wants to search in the corpus), searches it into the corpus, and displays all the sentences from the corpus, containing the query word or phrase. I have a couple of questions:
   
  1. Am I going in the right direction of the corpus development?
  2. What can further be done, in order to convert the corpus to fully functional corpus?
  3. What are the statistical measures that I can use to measure the accuracy of this corpus?
   
  I'll be very thankful to the provider of any helpful suggestions in this regard.
   
  Thanks. 

       
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080417/a72e09f5/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list