<!doctype html public "-//W3C//DTD W3 HTML//EN">
<html><head><style type="text/css"><!--
blockquote, dl, ul, ol, li { padding-top: 0 ; padding-bottom: 0 }
--></style><title>Arabic-L:LING:Hayat Corpus</title></head><body>
<div
>--------------------------------------------------------------------<span
></span>--</div>
<div>Arabic-L: Wed 15 Jan 2002</div>
<div>Moderator: Dilworth Parkinson
<dilworth_parkinson@byu.edu><br>
[To post messages to the list, send them to arabic-l@byu.edu]<br>
[To unsubscribe, send message to listserv@byu.edu with first line
reading:<br>
unsubscribe
arabic-l <span
></span
> <span
></span
> <span
></span> ]<br>
<br>
-------------------------Directory-----------------------------------<span
></span>--<br>
<br>
1) Subject: Hayat Corpus<br>
<br>
-------------------------Messages------------------------------------<span
></span>--<br>
1)</div>
<div>Date: 15 Jan 2002</div>
<div>From: Magali Duclaux <duclaux@elda.fr><br>
Subject: Hayat Corpus<br>
<br>
************************************************************<br>
ELRA - European Language Resources Association<br>
************************************************************<br>
<br>
We are pleased to announce the new resources<br>
available in our catalogue of language resources:<br>
<br>
ELRA W0030 Arabic Data Set<br>
ELRA W0031 GeFRePaC - German French Reciprocal<br>
Parallel Corpus<br>
<br>
A short description of these two new resources is given<br>
below.<br>
Please visit the online catalogue to get further details:<br>
http://www.elda.fr/catalog.html<br>
<br>
ELRA W0030 Arabic Data Set:<br>
The corpus contains Al-Hayat newspaper articles with<br>
value added for Language Engineering and Information<br>
Retrieval applications development purposes. Data has<br>
been organised in 7 subject specific databases according<br>
to the Al-Hayat subject tags. Mark-up, numbers, special<br>
characters and punctuation have been removed. The size<br>
of the total file is 268 MB. The dataset contains 18,639,264<br>
distinct tokens in 42,591 articles, organised in 7 domains.<br>
<br>
ELRA W0031 GeFRePaC - German French Reciprocal<br>
Parallel Corpus:<br>
GeFRePac was produced in the framework of the LRsP&P<br>
project. It contains 30 million words : 15 million for the<br>
German language, 15 million for the French language.<br>
<font face="Arial">It covers natural general language as used in<br>
public socio-political discourse and it has a focus on<br>
multilingual administration and commercial and legal<br>
documentation. It</font> was created for the purpose of<br>
developing, enhancing and improving translation aids.<br>
<br>
=====================================<br>
For further information, please contact:<br>
<br>
ELRA/ELDA<br>
55-57 rue Brillat-Savarin<br>
F-75013 Paris, France<br>
<br>
Tel:<x-tab> </x-tab>+33 01 43 13 33
33<br>
Fax:<x-tab> </x-tab>+33 01 43 13 33 30<br>
<br>
E-mail mapelli@elda.fr<br>
<br>
or visit our Web site:<br>
<font
color="#0000FF"><u>http://www.icp.grenet.fr/ELRA/home.html</u></font><br
>
or<font color="#0000FF"><u> http://www.elda.fr</u></font><br>
=====================================</div>
<div><br>
---------------------------------------------------------------------<span
></span>-----</div>
<div>End of Arabic-L: 15 Jan 2002</div>
</body>
</html>