Analyze word and phrase frequency

Herb Stahlke hfwstahlke at GMAIL.COM
Sun Apr 5 19:03:16 UTC 2009


Or programmed around.

On Sun, Apr 5, 2009 at 2:14 PM, Tom Zurinskas <truespel at hotmail.com> wrote:
> ---------------------- Information from the mail header -----------------------
> Sender:       American Dialect Society <ADS-L at LISTSERV.UGA.EDU>
> Poster:       Tom Zurinskas <truespel at HOTMAIL.COM>
> Subject:      Re: Analyze word and phrase frequency
> -------------------------------------------------------------------------------
>
> It's not my decision, but the computer's.  It doesn't know that "tidal wave" is one word.  Thus a computer-identified word count has some constraints.  So by definition a computer counted word can't be a hyphenated or two-word word.  It's just something to be cognizant of.
>
>
>
> Tom Zurinskas, USA - CT20, TN3, NJ33, FL5+
> see truespel.com
>
>
>
>
>
>
>
>
>
>
>
>
>
> ----------------------------------------
>> Date: Sat, 4 Apr 2009 19:34:45 -0400
>> From: hfwstahlke at GMAIL.COM
>> Subject: Re: Analyze word and phrase frequency
>> To: ADS-L at LISTSERV.UGA.EDU
>>
>> ---------------------- Information from the mail header -----------------------
>> Sender: American Dialect Society
>> Poster: Herb Stahlke
>> Subject: Re: Analyze word and phrase frequency
>> -------------------------------------------------------------------------------
>>
>> But why did you decide that there are no compound words in which the
>> words are separated orthographically by spaces? That orthographic
>> convention is a cultural convention and not clearly grounded in
>> linguistic structure.
>>
>> Herb
>>
>> On Sat, Apr 4, 2009 at 5:56 PM, Tom Zurinskas wrote:
>>> ---------------------- Information from the mail header -----------------------
>>> Sender: American Dialect Society
>>> Poster: Tom Zurinskas
>>> Subject: Re: Analyze word and phrase frequency
>>> -------------------------------------------------------------------------------
>>>
>>> Point is that word lists from counters need manual culling. One obvious outcome is that there are no two-word words (like "tidal wave").
>>>
>>> If typos could be figured out and retyped, that would be ideal for the word count.
>>>
>>> These decisions show what I went through in my culling process for the 5000 word list used in book 4.
>>>
>>>
>>> Tom Zurinskas, USA - CT20, TN3, NJ33, FL5+
>>> see truespel.com
>>>
>>>
>>>
>>>
>>>
>>>> ---------------------- Information from the mail header -----------------------
>>>> Sender: American Dialect Society
>>>> Poster: Herb Stahlke
>>>> Subject: Re: Analyze word and phrase frequency
>>>> -------------------------------------------------------------------------------
>>>>
>>>> Tom,
>>>>
>>>> Have you tested your definitions for accuracy. I understand that a
>>>> computer count won't get everything right and so you have to program
>>>> in certain common strings that should be omitted from the count. But
>>>> excluding single letters followed by a period? You would lose cases
>>>> like
>>>>
>>>> She ran as fast as I.
>>>>
>>>> Since you'd be counting all sorts of unprepared text, the treatment of
>>>> hyphenated words would treat "non-" like a word. Counting all
>>>> two-word compounds as two words ignores the morphology of compounding,
>>>> which is not reflected accurately in our orthography.
>>>>
>>>> Ignoring misspellings means running spellcheck on your results, but
>>>> the problem with spellcheck is that it only excludes words that don't
>>>> match anything in its dictionary. I spent a lot of time in the 80s
>>>> working in computer assisted instruction, and one of the projects I
>>>> devoted time to was developing a probabilistic spelling checker, a
>>>> program that could look a word that doesn't match the dictionary and
>>>> judge, by using letter frequency by position by length of word,
>>>> whether a misspelling is an otherwise correct answer. Most CAI simply
>>>> rejected all answers that weren't an exact match, which isn't very
>>>> useful in the language arts.
>>>>
>>>> So what do you lose by your rules and what does this do to the
>>>> accuracy of your word counts?
>>>>
>>>> Herb
>>>>
>>>> On Sat, Apr 4, 2009 at 11:13 AM, Tom Zurinskas wrote:
>>>>> ---------------------- Information from the mail header -----------------------
>>>>> Sender: American Dialect Society
>>>>> Poster: Tom Zurinskas
>>>>> Subject: Re: Analyze word and phrase frequency
>>>>> -------------------------------------------------------------------------------
>>>>>
>>>>> The word counter is good. It results in what looks like a spreadsheet, which is good, but I need to be able to copy/paste it into a regular spreadsheet. Problem - Only one line can be selected at a time. Not good. Anyone else have that problem?
>>>>>
>>>>> http://lifehacker.com/5190716/primitive-word-counter-analyzes-word-and-phrase-frequency
>>>>>
>>>>> To download it click on the blue word "link" at middle right.
>>>>>
>>>>> Computers do a great job at counting words. So we need to define what computer counted words "compwords" are.
>>>>>
>>>>> 1. A letter string bordered by spaces.
>>>>> 2. Intelligible (no typos).
>>>>> 3. Does not include numbers, punctuation, acronyms.
>>>>> 4. Two-word words (like tidal wave) are two words.
>>>>> 5. Hyphens count as spaces so hyphenated words are two words.
>>>>> 6. Reattach hyphenated words at end of line (or ignore).
>>>>> 7. Single letters followed by periods are not words.
>>>>> any more?
>>>>>
>>>>> Tom Zurinskas, USA - CT20, TN3, NJ33, FL5+
>>>>> see truespel.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ----------------------------------------
>>>>>> Date: Wed, 1 Apr 2009 21:45:21 -0400
>>>>>> From: jharbeck at SYMPATICO.CA
>>>>>> Subject: Fwd: Analyze word and phrase frequency
>>>>>> To: ADS-L at LISTSERV.UGA.EDU
>>>>>>
>>>>>> ---------------------- Information from the mail header -----------------------
>>>>>> Sender: American Dialect Society
>>>>>> Poster: James Harbeck
>>>>>> Subject: Fwd: Analyze word and phrase frequency
>>>>>> -------------------------------------------------------------------------------
>>>>>>
>>>>>> This looks like it could be useful for some kinds of analysis.
>>>>>>
>>>>>> -----Original Message-----
>>>>>>
>>>>>> http://lifehacker.com/5190716/primitive-word-counter-analyzes-word-and-phrase-frequency
>>>>>>
>>>>>> You can check the number of words in just about any word processing
>>>>>> program, but what about the distribution of those words?
>>>>>>
>>>>>> Primitive Word Counter analyzes text from your clipboard or file and
>>>>>> returns the frequency of words and phrases in the text. You can set a
>>>>>> minimum word length and have it ignore numbers to trim down the
>>>>>> volume of replies it returns.
>>>>>>
>>>>>> ------------------------------------------------------------
>>>>>> The American Dialect Society - http://www.americandialect.org
>>>>> _________________________________________________________________
>>>>> Rediscover HotmailĀ®: Now available on your iPhone or BlackBerry
>>>>> http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Mobile1_042009
>>>>>
>>>>> ------------------------------------------------------------
>>>>> The American Dialect Society - http://www.americandialect.org
>>>>>
>>>>
>>>> ------------------------------------------------------------
>>>> The American Dialect Society - http://www.americandialect.org
>>> _________________________________________________________________
>>> Rediscover HotmailĀ®: Get quick friend updates right in your inbox.
>>> http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Updates1_042009
>>>
>>> ------------------------------------------------------------
>>> The American Dialect Society - http://www.americandialect.org
>>>
>>
>> ------------------------------------------------------------
>> The American Dialect Society - http://www.americandialect.org
> _________________________________________________________________
> Rediscover HotmailĀ®: Get quick friend updates right in your inbox.
> http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Updates1_042009
>
> ------------------------------------------------------------
> The American Dialect Society - http://www.americandialect.org
>

------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org



More information about the Ads-l mailing list