Question about word tokens and new words on CLAN

Leonid Spektor spektor at andrew.cmu.edu
Thu May 28 19:29:35 UTC 2009


    I am sorry, but I have to STOP this discussion at this point. Please
lets continue this discussion on chibolts at googlegroups.com. I am going to
post the answer to this message to chibolts.

Thanks, 
Leonid.

On 28-05-09 15:21, "Schneider, Phyllis" <phyllis.schneider at ualberta.ca>
wrote:

> 
>  
> Leonid,
>  
> Could it be simply that the input from the command is so great that the first
> part is lost due to space limitations?  I have had that happen with large data
> sets, but I don't know whether that still happens.  The entire set of results
> appears, but the earliest ones disappear at some point as more results are
> added to the end.
>  
> --Phyllis Schneider
> 
> 
> From: info-childes at googlegroups.com [mailto:info-childes at googlegroups.com] On
> Behalf Of Leonid Spektor
> Sent: Thursday, May 28, 2009 11:43 AM
> To: info-childes at googlegroups.com; chibolts at googlegroups.com
> Subject: Re: Question about word tokens and new words on CLAN
> 
> Carolyn,
> 
>     The proper place to post this kind of question is on
> chibolts at googlegroups.com. I am posting a reply to both chibolts and
> info-childes address in case you do not subscribe to chibolts Google Group
> yet.
> 
>     There is no database size limit set in CLAN except for the memory size on
> your computer. Freq command should be able to count all words in database. The
> problem might be the format of email messages in CLAN system or the commands
> and options you use. Also, please make sure that you have the latest version
> of CLAN. To diagnose the problems you are describing it would be very helpful
> to me if you could email me a sample of your data and the command line you are
> using at spektor at andrew.cmu.edu. If you are not comfortable to email to me any
> sample of your data, then please just describe it and exactly what kind of
> analyzes you are trying run conduct. Is your data in just a plain text format
> or is it in fully legal CHAT format? If you are trying to simply get a
> frequency count of all words you can use "freq +y *.cha" command. That is
> assuming that your data file(s) have ".cha" extension. If you are trying to
> get just count of words associated with a particular speaker and your data is
> in legal CHAT format, then you can add "+t*CHI" option to above command. The
> "*CHI" refers to a speaker name in your data files.
> 
> Hope this help,
> Leonid.
> 
> 
> On 28-05-09 10:22, "Carolyn Piazza" <carolynpiazza at gmail.com> wrote:
> 
>> To the  info-childes google group,
>>  
>>    I'm not sure if this is  the place to send questions, but if it is not,
>> perhaps someone can point me to  the right place.
>>    I am working on data based on email  messages and have placed some 366
>> emails into the CLAN system.  When I  run a frequency count, the outcome is
>> words beginning with the letters O to  Z.  The words beginning with A-N are
>> missing.  Is there a certain  data base size in which the program will not
>> accurately process?  Thanks,  in advance, to whomever can answer this.
>>  
>> Best regards,
>> Carolyn  Piazza
>> cpiazza at fsu.edu
>> 
>> 
>> 
> 
> 
> > 
> 


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Info-CHILDES" group.
To post to this group, send email to info-childes at googlegroups.com
To unsubscribe from this group, send email to info-childes+unsubscribe at googlegroups.com
For more options, visit this group at http://groups.google.com/group/info-childes?hl=en
-~----------~----~----~----~------~----~------~--~---

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/info-childes/attachments/20090528/1dcbef49/attachment.htm>


More information about the Info-childes mailing list