Welsh lexical database and frequency count (fwd)

Andrew Carnie carnie at linguistlist.org
Mon Jan 21 23:13:46 UTC 2002


Cronfa Electroneg o Gymraeg (CEG)

A 1 million word lexical database and frequency count for Welsh

	Please circulate to those interested

	This is a word frequency analysis of 1,079,032 words of
written Welsh prose, based on 500 samples of approximately 2000 words
each, selected from a representative range of text types to
illustrate modern (mainly post 1970) Welsh prose writing. It was
conceived as providing a Welsh parallel to the Kucera and Francis
analysis for American English, and the LOB corpus for British
English, in the expectation that such an analysed corpus would
provide research tools for a number of academic disciplines:
psychology and psycholinguistics, child and second language
acquisition, general linguistics, and the linguistics of Modern
Welsh, including literary analysis.

      The sample included materials from the fields of novels and short
stories, religious writing, children's literature both factual and
fiction, non-fiction materials in the fields of education, science,
business, leisure activities, etc.,  public lectures, newspapers and
magazines, both national and local, reminiscences, academic writing,
and general administrative materials (letters, reports, minutes of
meetings).

      The resultant corpus was analysed to produce frequency counts of
words both in their raw form and as counts of lemmas where each token
is demutated and tagged to its root. This analysis also derives basic
information concerning the frequencies of different word classes,
inflections, mutations, and other grammatical features.

      Available on-line:

      Ellis, N. C., O'Dochartaigh, C., Hicks, W., Morgan, M., &
Laporte, N.  (2001). Cronfa Electroneg o Gymraeg (CEG): A 1 million
word lexical database and frequency count for Welsh. [On-line],
Available:
	http://www.bangor.ac.uk/ar/cb/ceg/ceg_eng.html
	http://www.bangor.ac.uk/ar/cb/ceg/ceg_cym.html
	---------------------------------------------------------
--
o.o.o.o.o.o.o.o.o.o

CELTLING
Post: celtling at lists.linguistlist.org OR celtling at listserv.linguistlist.org
Archives: <http://listserv.linguistlist.org/archives/celtling.html>
Subscribe/Unsubscribe - Go to Archives, then click "Join or leave" link

Website: <http://www.personal.psu.edu/ejp10/celtling>



More information about the Celtling mailing list