[Lexicog] english word lists

Ronald Moe ron_moe at SIL.ORG
Fri Aug 26 16:58:16 UTC 2011


A template is any kind of form or pattern that can be used to create
multiple copies of something. In manufacturing a template might be a metal
form used to make many identical parts. A mold for pottery is a kind of
template. A stencil is a kind of template. Children can use plastic stencils
that have different kinds of shapes cut out of them. A child can draw around
the cut out shape (for instance a square, circle, or triangle) to create
many images of the shape on a piece of paper.

 

The kind of template I am creating is a dictionary database that is
partially filled in. For instance there may be 50 languages in a country.
Education officials might want to produce a simple bilingual glossary for
each language that contains a standard list of words in the national
language with a gloss for the local language. What is the most efficient way
to do this? First, someone creates a list of 5,000 words in the national
language. Then someone adds part of speech and an indication of what sense
of each word should be glossed. For instance the English word 'wind' has
several meanings and the one needed for the glossaries might be 'blowing
air'. But we might want to create another entry for 'to wrap something
around an object'. Rather than make 50 people do all this work, we can have
one person do all the work on the English/national language words. Then we
have 50 people, one from each local language, supply the glosses.

 

I'm currently developing the database in Toolbox because I already have a
lot of information in Toolbox databases that I can use. Here are two sample
entries from my template:

 

\lx wind

\rank 01,096

\ps n

\is 1.1.3.1

\sd Wind

 

\lx wind

\rank 02,686

\ps v

\is 7.3.7.2

\sd Wrap

 

The first line is the English word, the second is the frequency ranking
('the' is the most frequent word in English and has the rank of '1'), the
third is the grammatical category, and the fourth and fifth lines are the
semantic domain that the word/sense belongs to. Eventually I would like to
have a simple definition and example sentence for each word. For instance:

 

\de air that is moving

\xe The clouds brought wind but no rain.

 

Such definitions would have to be tested. It may be difficult to translate a
definition or example sentence. So we would need to work on definition
styles and think about what kind of example sentences would be good in a
language learner's dictionary. Ideally I would like to use a dictionary that
is already published. But that would require getting permission from the
publisher.

 

My database currently has the top 20,000 English words. But a user might
want only the top 5,000. So the template needs to be developed starting with
the most frequent words. I am refining the list of headwords. The
grammatical category field is accurate. All the words have been assigned to
at least one semantic domain, but I need to refine the semantic domains.
I've added about 150 definitions and example sentences for the most common
words.

 

I want to distribute the template using WeSay, which is very easy for
non-linguists to use. People would be able to download the WeSay database
from the internet, add glosses, and then upload the database back to the
website. There is already a website for this (LanguageDepot) and software
(Chorus) to merge additions back into the master database. For instance
someone could add a Swahili gloss for each of the English words. Then each
language in Tanzania could download the database and produce a trilingual
Vernacular-Swahili-English glossary. All that would be required of the end
user would be to add a vernacular gloss for each entry. Printing and
publishing could be standardized. We already have ways of preparing the book
for publication. I saw someone demonstrate this process. It only took an
hour to prepare the dictionary for printing. Preparing the cover and front
matter would take some time, but could be standardized within a country. So
the goal is to have a (relatively) low-tech, easy to use template that can
be used over and over to mass produce simple glossaries.

 

Someone could also use the template to produce an English-vernacular
dictionary. This would require us to add definitions and example sentences
in English, which would then be translated into the vernacular.

 

Ron Moe

 

  _____  

From: lexicographylist at yahoogroups.com
[mailto:lexicographylist at yahoogroups.com] On Behalf Of tots cool
Sent: Friday, August 26, 2011 5:43 AM
To: lexicographylist at yahoogroups.com
Subject: Re: [Lexicog] english word lists

 

  

Dear Ron Moe,

 

What do you mean when you say template? 

 

Sittie

 

  _____  

From: Ronald Moe <ron_moe at sil.org>
To: lexicographylist at yahoogroups.com
Sent: Friday, August 26, 2011 12:39 AM
Subject: RE: [Lexicog] english word lists

  

I'm developing a word list that will serve as the basis of an
English-vernacular dictionary or glossary (where "vernacular" is replaced by
any language of the world). I'm basing the list on a frequency list from the
Corpus of Contemporary American English. It looks like there are some flaws
in the frequency list, so I'll have to supplement it with words taken from
language learners dictionaries. I'm currently classifying the words by
semantic domain. That task is not done. I haven't gotten very far in
defining the words and would appreciate help in developing the database. I
would be interested in collaboration on developing a template for bilingual
dictionaries.

Ron Moe

 

  _____  

From: lexicographylist at yahoogroups.com
[mailto:lexicographylist at yahoogroups.com] On Behalf Of Martin Benjamin
Sent: Thursday, August 25, 2011 5:58 AM
To: lexicographylist at yahoogroups.com
Subject: [Lexicog] english word lists

 

  

I'm compiling a file of various "essential" lists of English words to 
use as a reference point for determining common concepts for a 
multilingual dictionary. The lists that a term belongs to will be 
associated with the dictionary entries, so you'll be able to 
cross-reference, for example, Swadesh list glosses for any language in 
the dictionary.

Are there any lists not included below that should be added? Some of the 
lists below are based on frequency, some on core concepts. All can be 
cited for flaws or subjectivity. But in the aggregate, they at least 
provide a pretty good starting point for what someone would want in a 
bilingual dictionary with English, and potentially could be a useful 
tool for study across languages.

I'll put the final file online as a google doc and send a notice with 
the address to Lexicog.

So please, what lists would you include, and how can I access them? Here 
are those I've got already:

Comparative African Word List (SIL)
Reading Teachers List (1000 words from corpus frequency) 
http://kamu.si/qNfQDA
Clear English Most Commonly Used Words ( US ) http://kamu.si/riVPsV
General Service List (1995 version) http://jbauman.com/gsl.html
Dolch http://en.wikipedia.org/wiki/Dolch_Word_List
Swadesh http://kamu.si/nO2jyA
Ogden 's Basic English (extended) http://kamu.si/pgGDhs
Academic Word List http://kamu.si/oSlYet
VOA Special English http://www.manythings.org/voa/words.htm

Many thanks,
Martin Benjamin
martin at kamusi.org <mailto:martin%40kamusi.org> 

-- 
________________________________________________

Dr. Martin Benjamin
Executive Director, Kamusi Project International
http://kamusi.org

Full contact information, social networks, blog and photos:
http://about.me/martin.benjamin

size=1 width="100%" noshade color="#a0a0a0" align=center> 

No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1392 / Virus Database: 1520/3855 - Release Date: 08/24/11

 



  _____  

size=1 width="100%" noshade color="#a0a0a0" align=center> 

No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1392 / Virus Database: 1520/3858 - Release Date: 08/25/11

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20110826/731fb12d/attachment.htm>


More information about the Lexicography mailing list