Corpora: Parallel corpora and French software

Thu Jun 8 18:56:57 UTC 2000

Noelle,

It may be that the following two corpora are not entirely suitable
for your research, because they are primarily political and
legislative in their content.  But they are available from the
LDC, and you can check the LDC catalog web pages for further
information:

UN Parallel Text (English/Spanish/French)
http://morph.ldc.upenn.edu/Catalog/LDC94T4A.html

-- you can request just the English and French data, if you
prefer; the full corpus is a 3-cdrom set, with one language per
cdrom, one text document per data file, and alignment at the level
of document/file only.

Canadian Hansards (French/English)
http://morph.ldc.upenn.edu/Catalog/LDC95T20.html

-- a single cdrom containing
two distinct sets of parallel text; one set is aligned at the
sentence level, and the other (smaller) set is aligned at the
paragraph level (with additional alignment data for individual
word tokens within paragraphs).

Please write to ldc at ldc.upenn.edu if you would like further
information or are interested in purchasing either of these
collections.

Best,

Shannon Sears
Manager, Intellectual Property Rights and Membership
----------------------------------------------------------------------
Linguistic Data Consortium          Phone: (215) 573-1275
3615 Market Street                  Fax:   (215) 573-2175
Suite 200                           email: ssears at ldc.upenn.edu
Philadelphia, PA 19104-2608         www: http://www.ldc.upenn.edu

> From: NOELLE-VERONIQUE SERPOLLET <n.serpollet at lancaster.ac.uk>
> Subject: Corpora: Parallel corpora and French software
> To: CORPORA at hd.uib.no
> Date: Tue, 6 Jun 2000 15:07:16 +0100 (BST)
> MIME-Version: 1.0
> Precedence: bulk
>
> Apologies if you receive multiple copies of this document
> ***************************************************
> Dear list members,
>
> I am a French PhD student researching in Corpus Linguistics at
> Lancaster University. My PhD deals with modality and the
> subjunctive  and my aim is to carry out a contrastive analysis on
> the French and  English languages.
>
> I have been working on the Lancaster-Oslo-Bergen corpus (LOB)
> and on the Freiburg-LOB corpus (FLOB) for the English part
> of my data.
> Now I have started working on French corpora.
> I already have got some corpora (and I am aware of others) that I
> can  use but I was wondering if you could send me a list of data
> which I could
> have access to and on which I would be able to carry some
> analyses. Ideally, I would like to gather a French/English parallel
> corpus  (with the texts being aligned if possible).
>
> I will appreciate any contribution and help.
>
> Furthermore, are you aware of corpus tools (taggers/lemmatizers)
> that I could use for my analyses of the French?
> (I know about Cordial 6 Universites and will probably purchase it,
>  and I am currently working with ParaConc (Barlow, 1995)).
>
> I would be grateful if you could tell me where I could obtain
> a tagger/concordancer which would enable me to retrieve occurrences
>  of the French subjunctive.
>
> Thank you in advance for your help, your answers and suggestions.
> Noelle
>
> ----------------------------
> Noelle SERPOLLET
> Department of Linguistics and MEL
> Lancaster University,
> LANCASTER, LA1 4YT, UK
> n.serpollet at lancaster.ac.uk
>
>