Analyze word and phrase frequency

Herb Stahlke hfwstahlke at GMAIL.COM
Sat Apr 4 23:34:45 UTC 2009


But why did you decide that there are no compound words in which the
words are separated orthographically by spaces?  That orthographic
convention is a cultural convention and not clearly grounded in
linguistic structure.

Herb

On Sat, Apr 4, 2009 at 5:56 PM, Tom Zurinskas <truespel at hotmail.com> wrote:
> ---------------------- Information from the mail header -----------------------
> Sender:       American Dialect Society <ADS-L at LISTSERV.UGA.EDU>
> Poster:       Tom Zurinskas <truespel at HOTMAIL.COM>
> Subject:      Re: Analyze word and phrase frequency
> -------------------------------------------------------------------------------
>
> Point is that word lists from counters need manual culling.  One obvious outcome is that there are no two-word words (like "tidal wave").
>
> If typos could be figured out and retyped, that would be ideal for the word count.
>
> These decisions show what I went through in my culling process for the 5000 word list used in book 4.
>
>
> Tom Zurinskas, USA - CT20, TN3, NJ33, FL5+
> see truespel.com
>
>
>
>
>
>> ---------------------- Information from the mail header -----------------------
>> Sender: American Dialect Society
>> Poster: Herb Stahlke
>> Subject: Re: Analyze word and phrase frequency
>> -------------------------------------------------------------------------------
>>
>> Tom,
>>
>> Have you tested your definitions for accuracy. I understand that a
>> computer count won't get everything right and so you have to program
>> in certain common strings that should be omitted from the count. But
>> excluding single letters followed by a period? You would lose cases
>> like
>>
>> She ran as fast as I.
>>
>> Since you'd be counting all sorts of unprepared text, the treatment of
>> hyphenated words would treat "non-" like a word. Counting all
>> two-word compounds as two words ignores the morphology of compounding,
>> which is not reflected accurately in our orthography.
>>
>> Ignoring misspellings means running spellcheck on your results, but
>> the problem with spellcheck is that it only excludes words that don't
>> match anything in its dictionary. I spent a lot of time in the 80s
>> working in computer assisted instruction, and one of the projects I
>> devoted time to was developing a probabilistic spelling checker, a
>> program that could look a word that doesn't match the dictionary and
>> judge, by using letter frequency by position by length of word,
>> whether a misspelling is an otherwise correct answer. Most CAI simply
>> rejected all answers that weren't an exact match, which isn't very
>> useful in the language arts.
>>
>> So what do you lose by your rules and what does this do to the
>> accuracy of your word counts?
>>
>> Herb
>>
>> On Sat, Apr 4, 2009 at 11:13 AM, Tom Zurinskas wrote:
>>> ---------------------- Information from the mail header -----------------------
>>> Sender: American Dialect Society
>>> Poster: Tom Zurinskas
>>> Subject: Re: Analyze word and phrase frequency
>>> -------------------------------------------------------------------------------
>>>
>>> The word counter is good. It results in what looks like a spreadsheet, which is good, but I need to be able to copy/paste it into a regular spreadsheet. Problem - Only one line can be selected at a time. Not good. Anyone else have that problem?
>>>
>>> http://lifehacker.com/5190716/primitive-word-counter-analyzes-word-and-phrase-frequency
>>>
>>> To download it click on the blue word "link" at middle right.
>>>
>>> Computers do a great job at counting words. So we need to define what computer counted words "compwords" are.
>>>
>>> 1. A letter string bordered by spaces.
>>> 2. Intelligible (no typos).
>>> 3. Does not include numbers, punctuation, acronyms.
>>> 4. Two-word words (like tidal wave) are two words.
>>> 5. Hyphens count as spaces so hyphenated words are two words.
>>> 6. Reattach hyphenated words at end of line (or ignore).
>>> 7. Single letters followed by periods are not words.
>>> any more?
>>>
>>> Tom Zurinskas, USA - CT20, TN3, NJ33, FL5+
>>> see truespel.com
>>>
>>>
>>>
>>>
>>> ----------------------------------------
>>>> Date: Wed, 1 Apr 2009 21:45:21 -0400
>>>> From: jharbeck at SYMPATICO.CA
>>>> Subject: Fwd: Analyze word and phrase frequency
>>>> To: ADS-L at LISTSERV.UGA.EDU
>>>>
>>>> ---------------------- Information from the mail header -----------------------
>>>> Sender: American Dialect Society
>>>> Poster: James Harbeck
>>>> Subject: Fwd: Analyze word and phrase frequency
>>>> -------------------------------------------------------------------------------
>>>>
>>>> This looks like it could be useful for some kinds of analysis.
>>>>
>>>> -----Original Message-----
>>>>
>>>> http://lifehacker.com/5190716/primitive-word-counter-analyzes-word-and-phrase-frequency
>>>>
>>>> You can check the number of words in just about any word processing
>>>> program, but what about the distribution of those words?
>>>>
>>>> Primitive Word Counter analyzes text from your clipboard or file and
>>>> returns the frequency of words and phrases in the text. You can set a
>>>> minimum word length and have it ignore numbers to trim down the
>>>> volume of replies it returns.
>>>>
>>>> ------------------------------------------------------------
>>>> The American Dialect Society - http://www.americandialect.org
>>> _________________________________________________________________
>>> Rediscover HotmailĀ®: Now available on your iPhone or BlackBerry
>>> http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Mobile1_042009
>>>
>>> ------------------------------------------------------------
>>> The American Dialect Society - http://www.americandialect.org
>>>
>>
>> ------------------------------------------------------------
>> The American Dialect Society - http://www.americandialect.org
> _________________________________________________________________
> Rediscover HotmailĀ®: Get quick friend updates right in your inbox.
> http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Updates1_042009
>
> ------------------------------------------------------------
> The American Dialect Society - http://www.americandialect.org
>

------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org



More information about the Ads-l mailing list