[Lexicog] agreed-upon minimum size for lexicographic corpora

'Sang Yong Lee' sang-yong_lee@sall.com [lexicographylist] lexicographylist at yahoogroups.com
Tue Jul 5 09:39:59 UTC 2016


Hi!

There will be a difference whether the corpus be for the major languages or for the minority languages. If it is for the minority languages and endangered languages, Leonard E. Newell’s Handbook on Lexicography will give you a hint for the minimum size of the corpus.

He shared his experience in the Romblomanon (Philippines) project as follows:

 

For example, a frequency count of words in the Romblomanon project revealed that fully 2,000 words occurred only once in the first million words of text. About forty percent of those words, however, were inflected verb forms. (Newell 1995: 43)

 

Through his experience he recommends that three million words of text will be a modest project to aim for. Next figure is Unique Morphemes Occurring in Various Corpus size (Newell 1995: 21).

 

 

 

In this figure we can find that from three million corpus, 8,000 unique morphemes of the frequency of “three times or more” can be collected.

 

I hope this info be helpful for you.

 

Cordially,

 

Sang Yong

From: lexicographylist at yahoogroups.com [mailto:lexicographylist at yahoogroups.com] 
Sent: Monday, June 20, 2016 12:36 PM
To: lexicographylist at yahoogroups.com
Subject: [Lexicog] agreed-upon minimum size for lexicographic corpora

 

  

I wonder if there is any agreed-upon minimum size for lexicographic
corpora though, for example, Atkins and Rundell write that there is no
such definitive minimum size in their "Oxford Guide to Practical
Lexicography" (2008: 61).

Does anyone on this list know publications that propose such minimal
size even if the proposed numbers have not been accepted by other
lexicographers?

Thank you very much in advance.

-- 
Tsvi Sadan
Senior Lecturer in Hebrew and Semitic Languages
Bar-Ilan University, Israel
http://biu.academia.edu/tsvisadan/



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20160705/bab371e4/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 24962 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20160705/bab371e4/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 422 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20160705/bab371e4/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.jpg
Type: image/jpeg
Size: 359 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20160705/bab371e4/attachment-0002.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image009.jpg
Type: image/jpeg
Size: 332 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20160705/bab371e4/attachment-0003.jpg>


More information about the Lexicography mailing list