[Corpora-List] Greek corpus
Georgios Mikros
gmikros at isll.uoa.gr
Thu Feb 10 17:16:14 UTC 2011
Dear Taras,
In the University of Athens we have developed a web interface for 3
different Modern Greek corpora: a) Corpus of Greek Texts: It is the one that
Valentini previously mentioned and is maintained by Dr. Goutsos. However you
can find another interface with different features in this address
(http://sek.edu.gr/index.php?en) . The paper describing the corpus
compilation can be found here:
http://www.euppublishing.com/doi/abs/10.3366/cor.2010.0002 . b) Special
corpus for teaching Modern Greek as a foreign language (380Kwords): This
corpus contains texts organized in three different thematic areas (Market,
Health and Environment). The specific topics are related directly to the
teaching units of the curriculum "Intermediate level for Modern Greek" of
the Greek Language Centre. c) Learner corpus for the study of the
interlanguage produced by the foreigners who learn Modern Greek: The corpus
contains 333 essays written by foreigners originated by 51 different
countries who learn Modern Greek in the School of the University of Athens.
The current size of the corpus is 75 Kwords. Each essay has been transcribed
in electronic form and each error has been tagged using a custom error
taxonomy developed specifically for the needs of the project. Error tagging
carried out using special software (Episimiotis) which utilized XML for
coding errors and metalanguage data for each text. The learner corpus is
currently undocumented and we will redesign it from scratch. If you need raw
texts I can send you the files of the Special Corpus for teaching Modern
Greek as a foreign language.
Furthermore, you can access the corpora maintained by the Portal of the
Greek Language
(http://www.greek-language.gr/greekLang/modern_greek/tools/corpora/index.htm
l). There are 3 available corpora: a) Corpus of the newspaper "TA NEA" with
2Mwords. b) Corpus of the newspaper "Macedonia" with 3Mwords. c) Corpus of
all the books used in the Greek secondary education with 2Mwords. All the
above mentioned corpora are lemmatized and POS tagged.
Hope this helps.
Kind regards
George Mikros
-----------------------------------
George K. Mikros
Associate Professor
Department of Italian Language and Literature
School of Philosophy
National and Kapodistrian University of Athens
Greece
Tel.: +30 210 7277491
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
Taras Zagibalov
Sent: Wednesday, February 09, 2011 4:44 PM
To: Corpora at uib.no
Subject: [Corpora-List] Greek corpus
Dear list members,
Do you know any freely available plain text modern Greek corpus? Preferably
in Unicode.
Best regards,
Taras Zagibalov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110210/1ee9e637/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list