Question about word tokens and new words on CLAN

Thu May 28 19:21:04 UTC 2009

Leonid,

Could it be simply that the input from the command is so great that the
first part is lost due to space limitations?  I have had that happen
with large data sets, but I don't know whether that still happens.  The
entire set of results appears, but the earliest ones disappear at some
point as more results are added to the end.

--Phyllis Schneider

________________________________

From: info-childes at googlegroups.com
[mailto:info-childes at googlegroups.com] On Behalf Of Leonid Spektor
Sent: Thursday, May 28, 2009 11:43 AM
To: info-childes at googlegroups.com; chibolts at googlegroups.com
Subject: Re: Question about word tokens and new words on CLAN

Carolyn,

    The proper place to post this kind of question is on
chibolts at googlegroups.com. I am posting a reply to both chibolts and
info-childes address in case you do not subscribe to chibolts Google
Group yet.

    There is no database size limit set in CLAN except for the memory
size on your computer. Freq command should be able to count all words in
database. The problem might be the format of email messages in CLAN
system or the commands and options you use. Also, please make sure that
you have the latest version of CLAN. To diagnose the problems you are
describing it would be very helpful to me if you could email me a sample
of your data and the command line you are using at
spektor at andrew.cmu.edu. If you are not comfortable to email to me any
sample of your data, then please just describe it and exactly what kind
of analyzes you are trying run conduct. Is your data in just a plain
text format or is it in fully legal CHAT format? If you are trying to
simply get a frequency count of all words you can use "freq +y *.cha"
command. That is assuming that your data file(s) have ".cha" extension.
If you are trying to get just count of words associated with a
particular speaker and your data is in legal CHAT format, then you can
add "+t*CHI" option to above command. The "*CHI" refers to a speaker
name in your data files.

Hope this help,
Leonid.

On 28-05-09 10:22, "Carolyn Piazza" <carolynpiazza at gmail.com> wrote:

	To the info-childes google group,

	   I'm not sure if this is the place to send questions, but if
it is not, perhaps someone can point me to the right place.  
	   I am working on data based on email messages and have placed
some 366 emails into the CLAN system.  When I run a frequency count, the
outcome is words beginning with the letters O to Z.  The words beginning
with A-N are missing.  Is there a certain data base size in which the
program will not accurately process?  Thanks, in advance, to whomever
can answer this.

	Best regards,
	Carolyn Piazza
	cpiazza at fsu.edu

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Info-CHILDES" group.
To post to this group, send email to info-childes at googlegroups.com
To unsubscribe from this group, send email to info-childes+unsubscribe at googlegroups.com
For more options, visit this group at http://groups.google.com/group/info-childes?hl=en
-~----------~----~----~----~------~----~------~--~---

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/info-childes/attachments/20090528/da1f96a1/attachment.htm>