dark matter matters -word counts

Tom Zurinskas truespel at HOTMAIL.COM
Sat Dec 18 13:02:01 UTC 2010


Just plain word count data is not unrealistic.  What can we make of it?  Truespel book 4 takes the word count from Collins Cobuild dababase for 15.4 million word hits for the top 5k words of English.  It then counts the phonemes for these words to find freqency of phonemes in English.  These are ranked and compared to the ways the phonemes are spelled in English.  The top six spellings for each phoneme are listed (USA English accent).  This may be a first.

The same was done in truespel book one but for a 57k list of words (frequency of use not considered).  Comparisons can be made.

Word frequency is a tremendously skewed curve. The top 100 most frequent words take up over 55% of all words on a page.  The top 500 =  75%.  The top 1,000 = 83%.  The top 1,500 = 87%.  Going from the top 4,000 to 5,000 words only adds about 2%.

How many words minimally do we need to "reflect the lexicon?"  The VOA uses 1,500 words in their simplified English broadcasts overseas.  They have a dictionary of them to which I've added a truespel phonetic pronunciation guide where there was none (book 3 authorhouse.com).  Pronunciation is modeled after the "spoken words" of talking dictionaies.  This VOA guide I believe is the most accurate phonetic relection of USA English because it spells out all schwas and shows glottal stops and ~d for "t" swaps as alternative pronunciations.

Tom Zurinskas, USA - CT20, TN3, NJ33, FL7+
see truespel.com phonetic spelling



>
> ---------------------- Information from the mail header -----------------------
> Sender: American Dialect Society
> Poster: Jonathan Lighter
> Subject: Re: dark matter matters (UNCLASSIFIED)
> -------------------------------------------------------------------------------
>
> But that would be unrealistic.
>
> JL
>
> On Fri, Dec 17, 2010 at 4:55 PM, Mullins, Bill AMRDEC <
> Bill.Mullins at us.army.mil> wrote:
>
> > ---------------------- Information from the mail header
> > -----------------------
> > Sender: American Dialect Society
> > Poster: "Mullins, Bill AMRDEC"
> > Subject: Re: dark matter matters (UNCLASSIFIED)
> >
> > -------------------------------------------------------------------------------
> >
> > Classification: UNCLASSIFIED
> > Caveats: NONE
> >
> > I should have said that that it would be useful if it were done well,
> > using accurate datasets, and other presumed standards of scholarship.
> >
> > > -----Original Message-----
> > > From: American Dialect Society [mailto:ADS-L at LISTSERV.UGA.EDU] On
> > Behalf Of
> > > Jonathan Lighter
> > > Sent: Friday, December 17, 2010 3:52 PM
> > > To: ADS-L at LISTSERV.UGA.EDU
> > > Subject: Re: dark matter matters (UNCLASSIFIED)
> > >
> > > ---------------------- Information from the mail header
> > ----------------------
> > > -
> > > Sender: American Dialect Society
> > > Poster: Jonathan Lighter
> > > Subject: Re: dark matter matters (UNCLASSIFIED)
> > >
> > ------------------------------------------------------------------------
> > ------
> > > -
> > >
> > > If you don't rely on the scans and dates of Google Books.
> > >
> > > JL
> > >
> > > On Fri, Dec 17, 2010 at 3:38 PM, Mullins, Bill AMRDEC <
> > > Bill.Mullins at us.army.mil> wrote:
> > >
> > > > ---------------------- Information from the mail header
> > > > -----------------------
> > > > Sender: American Dialect Society
> > > > Poster: "Mullins, Bill AMRDEC"
> > > > Subject: Re: dark matter matters (UNCLASSIFIED)
> > > >
> > > >
> > ------------------------------------------------------------------------
> > ----
> > > ---
> > > >
> > > > Classification: UNCLASSIFIED
> > > > Caveats: NONE
> > > >
> > > > But even if a dictionary isn't designed to "reflect the lexicon",
> > > > studying how well it does so may still be a useful thing.
> > > >
> > > > >
> > > > > "To gauge how well dictionaries reflect the lexicon,."
> > > > >
> > > > >
> > > > >
> > > > > This suggests to me that the traditional scope and purpose (also
> > > > adequacy)
> > > > > of general dictionaries is being challenged by computer people who
> > may
> > > > be
> > > > > totally unfamiliar with lexicography as the craft is practiced. I
> > > > rather
> > > > > doubt they have even read the preface to even a respectable
> > college
> > > > > dictionary let alone the OED. I may very well be wrong and will
> > stand
> > > > > corrected humbly if I am.
> > > > >
> > > >
> > > > Classification: UNCLASSIFIED
> > > > Caveats: NONE
> > > >
> > > > ------------------------------------------------------------
> > > > The American Dialect Society - http://www.americandialect.org
> > > >
> > >
> > >
> > >
> > > --
> > > "If the truth is half as bad as I think it is, you can't handle the
> > truth."
> > >
> > > ------------------------------------------------------------
> > > The American Dialect Society - http://www.americandialect.org
> > Classification: UNCLASSIFIED
> > Caveats: NONE
> >
> > ------------------------------------------------------------
> > The American Dialect Society - http://www.americandialect.org
> >
>
>
>
> --
> "If the truth is half as bad as I think it is, you can't handle the truth."
>
> ------------------------------------------------------------
> The American Dialect Society - http://www.americandialect.org

------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org



More information about the Ads-l mailing list