[Corpora-List] (no subject)

Fri Jan 18 10:16:36 UTC 2013

Hi,
I have developed a morphological analyzer. For this, I divided the corpus in two parts: training and testing. From training part, I collected stems and rules. I used the same Xerox tools i.e. lexc and xfst. After the development of the analyzer, I collected test words from the testing part and listed their correct analyses. Then I provided these test words to the system and observed the output in comparison with the listed analyses. 

Regards.
--- On Fri, 18/1/13, Eirini LS <eirini_ls at yahoo.com> wrote:

From: Eirini LS <eirini_ls at yahoo.com>
Subject: Re: [Corpora-List] (no subject)
To: "maxwell" <maxwell at umiacs.umd.edu>
Cc: "corpora at uib.no" <corpora at uib.no>
Received: Friday, 18 January, 2013, 1:49 AM

Thank you very much for your answer.Sincerely, Irina L
        From: maxwell <maxwell at umiacs.umd.edu>
 To: Eirini LS <eirini_ls at yahoo.com> 
Cc:
 corpora at uib.no 
 Sent: Thursday, January 17, 2013 11:18 PM
 Subject: Re: [Corpora-List] (no subject)

On 2013-01-17 09:57, Eirini LS wrote:
> I mean that I have two different scripts for the same word (e.g. two
> scripts for "cat") written by different people. The first script
> generates 358 words (and only 107 words are correct), and the second
> script generates 497 words (and 471 words are correct). Can I say that
> the result of the first script is worse or not?

Clearly the recall and precision on the second script are higher.  Of course, without knowing what the total number of words that should be generated is, it's hard to say more.  In particular, it's hard to say whether 471 is good.  (Is the second script getting 471 out of 500 possible, or 471 out of 50,000?)

In general, though, I think comparing at this gross level is only going to give a general sort of answer.  What you really want is a test set where each input word is paired with its expected output word, so you can do error
 analysis and regression testing.

   Mike Maxwell

-----Inline Attachment Follows-----

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130118/803f4ebb/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora