[Corpora-List] Where can I find a English for Children corpus?

Michael Israel michael.israel at gmail.com
Thu Mar 10 21:23:19 UTC 2011


On Thu, Mar 10, 2011 at 11:18 AM, John F. Sowa <sowa at bestweb.net> wrote:

> A machine learning system might learn simple grammatical patterns.
> But if it can't interpret pictures, it won't learn semantics.
>
>

In fact, of course, this caveat holds just as much for the naturalistic data
in CHILDES as it does for a corpus of language from children's picture
books. The transcripts in CHILDES let us know what was said when and by
whom, but they do not include the rest (i.e. the vast majority) of the rich
context in which the language is produced and interpreted. Since most of
these corpora do not have video (or in many cases even audio) documentation,
even if a computer could interpret pictures, it still wouldn't be able to
learn semantics from these transcripts. Or at least not much semantics —
something might still be inferred from a distributional analysis of lexical
and grammatical constructions.

But a great deal (if not all) of children's lexical-conceptual knowledge is
not learned through language per se, but is something they learn to map onto
language, so a computational model which begins language acquisition without
some (proto-)semantics in place won't be very realistic (though it might be
interesting in many ways nonetheless).

Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110310/47e60f0a/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list