Corpora: corpus of French academic texts

Tine Greidanus dt.greidanus at let.vu.nl
Thu Jan 20 12:34:51 UTC 2000


*This message was transferred with a trial version of CommuniGate(tm) Pro*
Dear listmember,

I would like to make a frequency list of French academic words
comparable to the Academic Word List (of English words) by Averil
Coxhead (Victoria University of Wellington, New Zealand). This list
consists of around 600 words (= word families) that are reasonably
frequent in a wide range of academic texts (words like assume, achieve,
concept, for example). These academic words are common in academic
texts, but not so common elsewhere. Several studies showed that these
words are generally not as well known as technical vocabulary. An
Academic Word List is thus very useful for university students doing
their studies in a second language.

There are, to my knowledge, no recent frequency lists of French words,
made for pedagogical reasons, and even less a list corresponding to the
Coxhead list. There is a list made by the Cr‚dif and published in 1971,
the Vocabulaire g‚n‚ral d'orientation scientifique. It covers only the
scientific fields (mathematics, physics, natural sciences). I intend to
make a comparable list for the most important academic areas.

My  problem is the compilation of the corpus. The 'Institut de la Lange
Fran‡aise' (INALF) has an enormous corpus, called Frantext. It consists
mainly of literary texts of the nineteenth and twentieth century, but
there is also a subcorpus of 'textes scientifiques et techniques'. These
texts are however rather old and rather biased, and thus not suitable
for my purpose. So I have to find something else.

The Coxhead list is based on a corpus of 3.500.000 running words. This
corpus was divided into four 'faculty sections': Arts, Science, Law,
Commerce. Each faculty section was divided into seven subject areas. The
texts were journal articles, book chapters, course workbooks, laboratory
manuals and course notes, and were representative of the academic genre.

As for myself, I think of choosing  (parts of) textbooks for the French
DEUG program (the first two years of university). A corpus of 3.500.000
words is equal to approximately 10.000 pages. I could purchase the
textbooks and scan the pages, but this amounts to a lot of work I'd
rather avoid. So I would be grateful if anyone could give me tips
concerning existing electronic texts or corpora I could use. Does anyone
have experience with publishing companies of university textbooks
putting electronic versions of  books they publish at the disposal of
researchers? Are there difficulties to foresee as regards copy rights?
All information, ideas, tips, etc. are very wellcome!


Tine Greidanus
Vrije Universiteit
Faculteit der Letteren
De Boelelaan 1105
1081 HV Amsterdam
tel. 31 20 44 46 460
dt.greidanus at let.vu.nl



More information about the Corpora mailing list