Analyze word and phrase frequency

Tom Zurinskas truespel at HOTMAIL.COM
Sat Apr 4 21:56:52 UTC 2009


Point is that word lists from counters need manual culling.  One obvious outcome is that there are no two-word words (like "tidal wave").

If typos could be figured out and retyped, that would be ideal for the word count.

These decisions show what I went through in my culling process for the 5000 word list used in book 4.


Tom Zurinskas, USA - CT20, TN3, NJ33, FL5+
see truespel.com





> ---------------------- Information from the mail header -----------------------
> Sender: American Dialect Society
> Poster: Herb Stahlke
> Subject: Re: Analyze word and phrase frequency
> -------------------------------------------------------------------------------
>
> Tom,
>
> Have you tested your definitions for accuracy. I understand that a
> computer count won't get everything right and so you have to program
> in certain common strings that should be omitted from the count. But
> excluding single letters followed by a period? You would lose cases
> like
>
> She ran as fast as I.
>
> Since you'd be counting all sorts of unprepared text, the treatment of
> hyphenated words would treat "non-" like a word. Counting all
> two-word compounds as two words ignores the morphology of compounding,
> which is not reflected accurately in our orthography.
>
> Ignoring misspellings means running spellcheck on your results, but
> the problem with spellcheck is that it only excludes words that don't
> match anything in its dictionary. I spent a lot of time in the 80s
> working in computer assisted instruction, and one of the projects I
> devoted time to was developing a probabilistic spelling checker, a
> program that could look a word that doesn't match the dictionary and
> judge, by using letter frequency by position by length of word,
> whether a misspelling is an otherwise correct answer. Most CAI simply
> rejected all answers that weren't an exact match, which isn't very
> useful in the language arts.
>
> So what do you lose by your rules and what does this do to the
> accuracy of your word counts?
>
> Herb
>
> On Sat, Apr 4, 2009 at 11:13 AM, Tom Zurinskas wrote:
>> ---------------------- Information from the mail header -----------------------
>> Sender: American Dialect Society
>> Poster: Tom Zurinskas
>> Subject: Re: Analyze word and phrase frequency
>> -------------------------------------------------------------------------------
>>
>> The word counter is good. It results in what looks like a spreadsheet, which is good, but I need to be able to copy/paste it into a regular spreadsheet. Problem - Only one line can be selected at a time. Not good. Anyone else have that problem?
>>
>> http://lifehacker.com/5190716/primitive-word-counter-analyzes-word-and-phrase-frequency
>>
>> To download it click on the blue word "link" at middle right.
>>
>> Computers do a great job at counting words. So we need to define what computer counted words "compwords" are.
>>
>> 1. A letter string bordered by spaces.
>> 2. Intelligible (no typos).
>> 3. Does not include numbers, punctuation, acronyms.
>> 4. Two-word words (like tidal wave) are two words.
>> 5. Hyphens count as spaces so hyphenated words are two words.
>> 6. Reattach hyphenated words at end of line (or ignore).
>> 7. Single letters followed by periods are not words.
>> any more?
>>
>> Tom Zurinskas, USA - CT20, TN3, NJ33, FL5+
>> see truespel.com
>>
>>
>>
>>
>> ----------------------------------------
>>> Date: Wed, 1 Apr 2009 21:45:21 -0400
>>> From: jharbeck at SYMPATICO.CA
>>> Subject: Fwd: Analyze word and phrase frequency
>>> To: ADS-L at LISTSERV.UGA.EDU
>>>
>>> ---------------------- Information from the mail header -----------------------
>>> Sender: American Dialect Society
>>> Poster: James Harbeck
>>> Subject: Fwd: Analyze word and phrase frequency
>>> -------------------------------------------------------------------------------
>>>
>>> This looks like it could be useful for some kinds of analysis.
>>>
>>> -----Original Message-----
>>>
>>> http://lifehacker.com/5190716/primitive-word-counter-analyzes-word-and-phrase-frequency
>>>
>>> You can check the number of words in just about any word processing
>>> program, but what about the distribution of those words?
>>>
>>> Primitive Word Counter analyzes text from your clipboard or file and
>>> returns the frequency of words and phrases in the text. You can set a
>>> minimum word length and have it ignore numbers to trim down the
>>> volume of replies it returns.
>>>
>>> ------------------------------------------------------------
>>> The American Dialect Society - http://www.americandialect.org
>> _________________________________________________________________
>> Rediscover HotmailĀ®: Now available on your iPhone or BlackBerry
>> http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Mobile1_042009
>>
>> ------------------------------------------------------------
>> The American Dialect Society - http://www.americandialect.org
>>
>
> ------------------------------------------------------------
> The American Dialect Society - http://www.americandialect.org
_________________________________________________________________
Rediscover HotmailĀ®: Get quick friend updates right in your inbox.
http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Updates1_042009

------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org



More information about the Ads-l mailing list