Question about word tokens and new words on CLAN
Leonid Spektor
spektor at andrew.cmu.edu
Thu May 28 19:29:35 UTC 2009
I am sorry, but I have to STOP this discussion at this point. Please
lets continue this discussion on chibolts at googlegroups.com. I am going to
post the answer to this message to chibolts.
Thanks,
Leonid.
On 28-05-09 15:21, "Schneider, Phyllis" <phyllis.schneider at ualberta.ca>
wrote:
>
>
> Leonid,
>
> Could it be simply that the input from the command is so great that the first
> part is lost due to space limitations? I have had that happen with large data
> sets, but I don't know whether that still happens. The entire set of results
> appears, but the earliest ones disappear at some point as more results are
> added to the end.
>
> --Phyllis Schneider
>
>
> From: info-childes at googlegroups.com [mailto:info-childes at googlegroups.com] On
> Behalf Of Leonid Spektor
> Sent: Thursday, May 28, 2009 11:43 AM
> To: info-childes at googlegroups.com; chibolts at googlegroups.com
> Subject: Re: Question about word tokens and new words on CLAN
>
> Carolyn,
>
> The proper place to post this kind of question is on
> chibolts at googlegroups.com. I am posting a reply to both chibolts and
> info-childes address in case you do not subscribe to chibolts Google Group
> yet.
>
> There is no database size limit set in CLAN except for the memory size on
> your computer. Freq command should be able to count all words in database. The
> problem might be the format of email messages in CLAN system or the commands
> and options you use. Also, please make sure that you have the latest version
> of CLAN. To diagnose the problems you are describing it would be very helpful
> to me if you could email me a sample of your data and the command line you are
> using at spektor at andrew.cmu.edu. If you are not comfortable to email to me any
> sample of your data, then please just describe it and exactly what kind of
> analyzes you are trying run conduct. Is your data in just a plain text format
> or is it in fully legal CHAT format? If you are trying to simply get a
> frequency count of all words you can use "freq +y *.cha" command. That is
> assuming that your data file(s) have ".cha" extension. If you are trying to
> get just count of words associated with a particular speaker and your data is
> in legal CHAT format, then you can add "+t*CHI" option to above command. The
> "*CHI" refers to a speaker name in your data files.
>
> Hope this help,
> Leonid.
>
>
> On 28-05-09 10:22, "Carolyn Piazza" <carolynpiazza at gmail.com> wrote:
>
>> To the info-childes google group,
>>
>> I'm not sure if this is the place to send questions, but if it is not,
>> perhaps someone can point me to the right place.
>> I am working on data based on email messages and have placed some 366
>> emails into the CLAN system. When I run a frequency count, the outcome is
>> words beginning with the letters O to Z. The words beginning with A-N are
>> missing. Is there a certain data base size in which the program will not
>> accurately process? Thanks, in advance, to whomever can answer this.
>>
>> Best regards,
>> Carolyn Piazza
>> cpiazza at fsu.edu
>>
>>
>>
>
>
> >
>
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Info-CHILDES" group.
To post to this group, send email to info-childes at googlegroups.com
To unsubscribe from this group, send email to info-childes+unsubscribe at googlegroups.com
For more options, visit this group at http://groups.google.com/group/info-childes?hl=en
-~----------~----~----~----~------~----~------~--~---
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/info-childes/attachments/20090528/1dcbef49/attachment.htm>
More information about the Info-childes
mailing list