[Corpora-List] Falko German learner corpus release version 2.0

Marc Reznicek marc.reznicek at staff.hu-berlin.de
Wed Dec 22 14:12:36 UTC 2010


Release of  Falko Learner Corpus of German as Foreign Language Version 2.0 
Falko is a freely available corpus of advanced learners of German as foreign
language. The version 2.0 has now been released and can be searched with the
ANNIS2 search tool under:
http://korpling.german.hu-berlin.de/falko-suche/search.html. You will find
our project site at
http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/for
schung/falko/standardseite/ .
The corpus (264,261 tokens) consists of texts written by advanced foreign
students of German two tasks (argumentative essays, summaries) and a German
native speaker control group for each task. A wide range of metadata has
been collected for all participants including age, sex, proficiency, a
language biography (native languages, second and foreign languages,
instructed learning phases, participation in study abroad context etc.) by
which the data can be easily filtered. Searches can be performed over more
than one corpus at a time as well.
In version 2.0 three explicit target hypotheses have been added to the essay
subcorpus for both, learners and native speakers. The first one corrects
orthographic and grammatical errors only; the second one considers a wide
range of semantic and pragmatic incongruences as well. The third one focuses
on errors concerning complex verbs. 
For each target hypothesis surface deviations (insertion, deletion, change,
split of token, merging of tokens, move of token) from the learner text have
been coded in the corpus to enable fast queries for error candidates. Each
target hypothesis has in turn been POS-tagged and lemmatized.
The corpus also contains error annotations for the complex verbs in both
learner and native speaker essays.
For the summaries, canonicity and topological fields have been annotated.
A full description of the contents and the annotation layers (in German) can
be found at
http://www.linguistik.hu-berlin.de/institut/professuren/korpuslinguistik/for
schung/falko/pdf/Falko-Handbuch_Korpusaufbau%20und%20Annotationen.pdf 
We are thankful for every kind of feedback. Please write to
falko-korpus at hu-berlin.de.We will be happy to help you in case you need any
assistance.
Greetings, 
Marc Reznicek
Humboldt-Universität zu Berlin
Marc.Reznicek at staff.hu-berlin.de


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list