[Corpora-List] Where can I find a English for Children corpus?

chris brew cbrew at acm.org
Thu Mar 10 17:48:05 UTC 2011


Another of Charniak's papers from that time has my favourite title ever:


@inproceedings{Charniak:1973:JJS:1624775.1624816,
 author = {Charniak, Eugene},
 title = {Jack and Janet in search of a theory of knowledge},
 booktitle = {Proceedings of the 3rd international joint conference on
Artificial intelligence},
 year = {1973},
 location = {Stanford, USA},
 pages = {337--343},
 numpages = {7},
 url = {http://portal.acm.org/citation.cfm?id=1624775.1624816},
 acmid = {1624816},
 publisher = {Morgan Kaufmann Publishers Inc.},
 address = {San Francisco, CA, USA},
}


On Thu, Mar 10, 2011 at 11:18 AM, John F. Sowa <sowa at bestweb.net> wrote:

> On 3/10/2011 8:46 AM, Michael Israel wrote:
>
>> There is also a great deal of research based on this data showing that
>> the words and grammatical constructions which children learn are in many
>> (but not all) respects highly correlated with the frequency with these
>> occur in the spoken input that the children hear. So, CHILDES might be
>> more relevant than you think.
>>
>
> An analysis of the stages of language learning may provides some useful
> clues to the underlying mechanisms.
>
> But stories written for children are notoriously difficult to interpret.
> The major problem is that they depend very heavily on background
> knowledge that is not easy to verbalize.
>
> Charniak discovered that point 40 years ago:
>
> Charniak 1972: Eugene Charniak, “Toward a Model Of Children's Story
> Comprehension,” PhD thesis 1972, MIT, MIT Artificial Intelligence
> Laboratory Technical Report TR-266. Also at
> ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-266.pdf
>
> A notorious example is the first story in the Dick & Jane series.
> Every page is filled with a picture and one line of text,
> such as "Oh, look." and "Oh, Oh, Oh."  Eventually it reaches
> the level of "See Spot run."
>
> A machine learning system might learn simple grammatical patterns.
> But if it can't interpret pictures, it won't learn semantics.
>
> John Sowa
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110310/d2f87c08/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list