If a verb occupies the structural subject position and is followed by a verb, it is tagged as subj
Brian MacWhinney
macw at cmu.edu
Tue Jun 17 08:05:31 UTC 2014
Dear Zhou Chen,
Many thanks for your detection mismarking of Subject in some of the children's sentences in the Manchester corpus. There is no separate training for different English corpora. Once the post.db database is trained, I then run it automatically over both Eng-UK-MOR and Eng-NA-MOR equally.
In general, we have found MOR tagging for English to be about 96% accurate. GRASP tagging is closer to 92% accurate. These numbers are basically state of the art numbers for current taggers. For example, you could try using the Stanford tagger for CHILDES and you would find that our taggers do much better. Because both MOR and GRASP use statistical methods, it is not possible to simply go into their databases to make individual changes. However, if you can detect consistent errors based on particular narrowly stated configurations, you could use a regular expression editor to correct these across files. For example, it is probably the case that verbs cannot be subjects. You could locate all such violations and fix them. The training corpus is inside the English MOR folder. Fixing any errors of this type in the training corpus would be particularly helpful.
The case you note here of "got have it" is, of course, ungrammatical in the first place. Although MOR tags this correctly, it is difficult to imagine how GRASP could know what is going on in such erroneous forms. Of course, English is rife with noun-verb ambiguities. But still one would expect GRASP to learn that something tagged as a verb cannot be a subject.
-- Brian MacWhinney
On Jun 17, 2014, at 12:13 AM, Zhuo Chen <czcindy426 at gmail.com> wrote:
> Dear CHIBOLTS,
>
> I found that if a verb is in the structural subject position and is immediately followed by another verb, it will be tagged as a subject. I attached a picture here to show this problem. I'm working with the most recent version of Manchester corpus.
>
> Thanks!
>
>
> --
> You received this message because you are subscribed to the Google Groups "chibolts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
> To post to this group, send email to chibolts at googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/5f509854-ad91-4e95-a2dd-9c7d1ef2e3c8%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> <verb tagged as subject.tiff>
--
You received this message because you are subscribed to the Google Groups "chibolts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chibolts+unsubscribe at googlegroups.com.
To post to this group, send email to chibolts at googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/chibolts/0F42D2E3-AFDA-45AE-8F01-2745534C72F3%40cmu.edu.
For more options, visit https://groups.google.com/d/optout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20140617/bee5001e/attachment.htm>
More information about the Chibolts
mailing list