Question about word tokens and new words on CLAN

Leonid Spektor spektor at andrew.cmu.edu
Thu May 28 19:36:17 UTC 2009


Phyllis,

    If the list of words is too long, then this list should be placed into a
file "file.cut" and then this file should be used with "+s at file.cut" option.
If the number of data files listed is too long, then there would be only
some words missing but not only A-N words. This, of cause, assumes that
email messages themselves are not just lists of alphabetically ordered
words.

Thanks, 
Leonid.

On 28-05-09 15:21, "Schneider, Phyllis" <phyllis.schneider at ualberta.ca>
wrote:

> 
>  
> Leonid,
>  
> Could it be simply that the input from the command is so great that the first
> part is lost due to space limitations?  I have had that happen with large data
> sets, but I don't know whether that still happens.  The entire set of results
> appears, but the earliest ones disappear at some point as more results are
> added to the end.
>  
> --Phyllis Schneider
> 
> 
> From: info-childes at googlegroups.com [mailto:info-childes at googlegroups.com] On
> Behalf Of Leonid Spektor
> Sent: Thursday, May 28, 2009 11:43 AM
> To: info-childes at googlegroups.com; chibolts at googlegroups.com
> Subject: Re: Question about word tokens and new words on CLAN
> 
> Carolyn,
> 
>     The proper place to post this kind of question is on
> chibolts at googlegroups.com. I am posting a reply to both chibolts and
> info-childes address in case you do not subscribe to chibolts Google Group
> yet.
> 
>     There is no database size limit set in CLAN except for the memory size on
> your computer. Freq command should be able to count all words in database. The
> problem might be the format of email messages in CLAN system or the commands
> and options you use. Also, please make sure that you have the latest version
> of CLAN. To diagnose the problems you are describing it would be very helpful
> to me if you could email me a sample of your data and the command line you are
> using at spektor at andrew.cmu.edu. If you are not comfortable to email to me any
> sample of your data, then please just describe it and exactly what kind of
> analyzes you are trying run conduct. Is your data in just a plain text format
> or is it in fully legal CHAT format? If you are trying to simply get a
> frequency count of all words you can use "freq +y *.cha" command. That is
> assuming that your data file(s) have ".cha" extension. If you are trying to
> get just count of words associated with a particular speaker and your data is
> in legal CHAT format, then you can add "+t*CHI" option to above command. The
> "*CHI" refers to a speaker name in your data files.
> 
> Hope this help,
> Leonid.
> 
> 
> On 28-05-09 10:22, "Carolyn Piazza" <carolynpiazza at gmail.com> wrote:
> 
>> To the  info-childes google group,
>>  
>>    I'm not sure if this is  the place to send questions, but if it is not,
>> perhaps someone can point me to  the right place.
>>    I am working on data based on email  messages and have placed some 366
>> emails into the CLAN system.  When I  run a frequency count, the outcome is
>> words beginning with the letters O to  Z.  The words beginning with A-N are
>> missing.  Is there a certain  data base size in which the program will not
>> accurately process?  Thanks,  in advance, to whomever can answer this.
>>  
>> Best regards,
>> Carolyn  Piazza
>> cpiazza at fsu.edu
>> 
>> 
>> 
> 
> 
> > 
> 


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "chibolts" group.
To post to this group, send email to chibolts at googlegroups.com
To unsubscribe from this group, send email to chibolts+unsubscribe at googlegroups.com
For more options, visit this group at http://groups.google.com/group/chibolts?hl=en
-~----------~----~----~----~------~----~------~--~---

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/chibolts/attachments/20090528/a353f6a0/attachment.htm>


More information about the Chibolts mailing list