<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD>

<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">

<META content="MSHTML 6.00.2600.0" name=GENERATOR>

<STYLE></STYLE>

</HEAD>

<BODY bgColor=#ffffff>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2>Mery --</FONT></DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2>You said: </FONT></DIV>

<DIV><FONT face=Arial size=2></FONT><FONT face=Arial size=2></FONT><FONT

face=Arial size=2></FONT><BR></DIV>

<DIV>"1) To create a big general bilingual dictionary  should we start from

the number of lemmas which are most frequent in the wordlist extracted from our

corpus and then increase or decrease their quantity according to the space

available in the dictionary, or should we start from the number of entries

generally needed in a dictionary of that size and than extract from the corpus

the needed number of most frequent corpus lemmas?"</DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2>"Most frequent .. extracted from the corpus" -

there's a misconception here, I suspect...</FONT></DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2>The question is, is the corpus word list bigger

than the dictionary, or vice versa?</FONT></DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2>My previous message commented on small

dictionaries.  </FONT><FONT face=Arial size=2>For a big dictionary -- even

a one-volume big dictionary, such as the (New) Oxford Dictionary of English

-- a corpus alone is not enough. After throwing out names and junk (e.g.

strings of letters that are not words at all) we included ALL the words and

senses in the corpus, AND THEN SOME MORE.  </FONT></DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2>What more?  Well, here are some

examples:</FONT></DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2>1. Some of the words in NODE that are not in the

BNC are words of historical importance - used by Shakespeare, for example, but

since obsolete. </FONT></DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2>2. Other entries are names of plants and

animals, </FONT><FONT face=Arial size=2>for which a </FONT><FONT face=Arial

size=2>systematic survey of the literature was carried out by my colleagues

David Shirt and Bill Trumble, </FONT><FONT face=Arial size=2>making constantly

difficult decisions about whether the local name for some non-European

</FONT><FONT face=Arial size=2>plant/animal should be included in the

dictionary, to reflect the "global" nature </FONT><FONT face=Arial

size=2>of modern English. </FONT></DIV>

<DIV><FONT face=Arial size=2></FONT><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2>3. Sometimes rather rare scientific terms are

</FONT><FONT face=Arial size=2>included for consistency of coverage.  For

example, no one would disagree with the decision to include "carbon" and

"hydrogen", but for consistency of sets, we included ALL the chemical elements,

even the rarest ones. A similar principal was applied to many other

fields.  </FONT><FONT face=Arial size=2>I myself wrote the entries for

languages and peoples, using the Oxford Encyclopedia of Language and Linguistics

as a guide, but making only a selection, and trying (no doubt failing!) to be

consistent. I included many names of languages and people that do not occur in

any BNC text. Conversely, it is theoretically possible (at this date,

I can't remember an actual case) that BNC may have several mentions of

some extremely rare language which happened to be in the news in 1991-3 but

which did not make it into NODE. A language or people that hit the

headlines briefly in 1991-3 -- for example in a story or review of langauge

death -- would not necessarily merit inclusion in a 1998 dictionary.

</FONT></DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2>4. Other entries are new words or

new senses discovered by the Oxford Reading Program.   An example

that comes to mind is the use of "dope" in Black American English as an

adjective of approbation  -- " Man, that suit is dope".  

Not surprisingly, this sense is not in the British National Corpus. 

Another example from the same register: "hood" and "burbs". I'm sad to say

that in NODE we pulled our punches when defining these two,

failing to explain the connotations adequately. Somewhere I have a wonderful

(recent) citation about a rowdy rapper who was fired by his recording company

for "bringing the hood to the burbs".  But I digress. </FONT></DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2>Thus, to create the word list for NODE, we had at

</FONT><FONT face=Arial size=2>least three techniques: corpus (which helped us

to shape the entries for all the common words), literature survey of special

fields, and citations for new words and senses -- often informal in register --

from the reading program.  The Oxford view is that there is no

substitute for a reading program. Well, we were </FONT></DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2>Something similar happened on the first edition of

Collins English Dictionary (1979). In those days, </FONT><FONT face=Arial

size=2>we did not have a corpus at all. CED supplemented the basic word list

with a systematic survey of the literature (mostly course books) in many

different specialist fields. </FONT></DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2>* * *</FONT></DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2>I don't know about bilingual lexicography, but I

suspect that, for dictionary-rich languages, a reasonable starting point would

be a comparison of the word lists in native-speaker dictionaries in the two

languages.  Comment, anyone?</FONT></DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2>Patrick (again). </FONT></DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV>

<DIV><FONT face=Arial size=2></FONT> </DIV><!-- |**|end egp html banner|**| -->

<br>

<!-- |**|begin egp html banner|**| -->

<table border=0 cellspacing=0 cellpadding=2>

<tr bgcolor=#FFFFCC>

<td align=center><font size="-1" color=#003399><b>Yahoo! Groups Sponsor</b></font></td>

</tr>

<tr bgcolor=#FFFFFF>

<td align=center width=470><table border=0 cellpadding=0 cellspacing=0> <tr> <td align=center><font face=arial size=-2>ADVERTISEMENT</font><br><a href="http://rd.yahoo.com/SIG=12c1dhc50/M=268585.4521611.5694062.1261774/D=egroupweb/S=1709195911:HM/EXP=1079433989/A=1950447/R=0/SIG=1245hvqf1/*http://ashnin.com/clk/muryutaitakenattogyo?YH=4521611&yhad=1950447" alt=""><img src="http://us.a1.yimg.com/us.yimg.com/a/qu/quinstreet/300x250_uofp_stripes.gif" alt="click here" width="300" height="250" border="0"></a></td></tr></table> </td>

</tr>

<tr><td><img alt="" width=1 height=1 src="http://us.adserver.yahoo.com/l?M=268585.4521611.5694062.1261774/D=egroupweb/S=:HM/A=1950447/rand=273881411"></td></tr>

</table>

<!-- |**|end egp html banner|**| -->

<!-- |**|begin egp html banner|**| -->

<br>

<tt><hr width="500">

<b>Yahoo! Groups Links</b><br>

<ul>

<li>To visit your group on the web, go to:<br><a href="http://groups.yahoo.com/group/lexicographylist/">http://groups.yahoo.com/group/lexicographylist/</a><br> 

<li>To unsubscribe from this group, send an email to:<br><a href="mailto:lexicographylist-unsubscribe@yahoogroups.com?subject=Unsubscribe">lexicographylist-unsubscribe@yahoogroups.com</a><br> 

<li>Your use of Yahoo! Groups is subject to the <a href="http://docs.yahoo.com/info/terms/">Yahoo! Terms of Service</a>.

</ul>

</tt>

</br>

<!-- |**|end egp html banner|**| -->

</BODY></HTML>