[Corpora-List] Corpus Development

Thu Apr 17 14:29:18 UTC 2008

Congratulations, Fatima!
You seem to have done a great deal already!

1. Am I going in the right direction of the corpus development?
Yes, I would say so.
2. What can further be done, in order to convert the corpus to fully functional corpus?
There are so many directions... adding wildcards, discontinuous phrases, and tag searches to the query system; display of longer contexts; providing frequency lists for words, lemmas, and tags for each of your 3 datasets; adding collocation and keyword functions; search by author; search by date; making the corpus publicly accessible; if any of the texts are available in translation, parallel text facility; etc
3. What are the statistical measures that I can use to measure the accuracy of this corpus?
I'm not sure what you mean; do you mean the accuracy of your tagging? I'm no expert in this area, but I'm sure others will help you with this question.
Best
Ramesh

Ramesh Krishnamurthy
Lecturer in English Studies, School of Languages and Social Sciences,
Aston University, Birmingham B4 7ET, UK
Tel: +44 (0)121-204-3812 ; Fax: +44 (0)121-204-3766 [Room NX08, 10th
Floor, North Wing of Main Building]
http://www.aston.ac.uk/lss/school/staff/krishnamurthyr.jsp
Director, ACORN (Aston Corpus Network project): http://acorn.aston.ac.uk/
________________________________
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of fatima zuhra
Sent: 17 April 2008 14:48
To: Corpora at uib.no
Subject: [Corpora-List] Corpus Development

Hi All,

I am a student of MS at Peshawar University. I have developmed a Pashto corpus as a part of my MS research project. My corpus contains written Pashto data and it is XML tagged. The corps consists of three cells: one containing Pashto novels, another contains Pashto essays' data and the other having Pashto letters' data. All the data is tagged upto sentence level. I have developed a user interface for the corpus, which takes a word or phrase from the user (that the user wants to search in the corpus), searches it into the corpus, and displays all the sentences from the corpus, containing the query word or phrase. I have a couple of questions:

1. Am I going in the right direction of the corpus development?
2. What can further be done, in order to convert the corpus to fully functional corpus?
3. What are the statistical measures that I can use to measure the accuracy of this corpus?

I'll be very thankful to the provider of any helpful suggestions in this regard.

Thanks.

________________________________
Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.<http://us.rd.yahoo.com/evt=51733/*http:/mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ%20>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080417/8d7f3ddb/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora