28.394, FYI: Reference Corpus of Middle High German (REM)

The LINGUIST List linguist at listserv.linguistlist.org
Thu Jan 19 17:20:07 UTC 2017


LINGUIST List: Vol-28-394. Thu Jan 19 2017. ISSN: 1069 - 4875.

Subject: 28.394, FYI: Reference Corpus of Middle High German (REM)

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté,
                                   Michael Czerniakowski)
Homepage: http://linguistlist.org

*****************    LINGUIST List Support    *****************
                       Fund Drive 2016
                   25 years of LINGUIST List!
Please support the LL editors and operation with a donation at:
           http://funddrive.linguistlist.org/donate/

Editor for this issue: Yue Chen <yue at linguistlist.org>
================================================================


Date: Thu, 19 Jan 2017 12:19:57
From: Stefanie Dipper [dipper at linguistics.rub.de]
Subject: Reference Corpus of Middle High German (REM)

 
We are happy to announce the public release of the Reference Corpus of Middle
High German (REM), which is available from the following website:

https://www.linguistics.rub.de/rem/

The Reference Corpus of Middle High German (1050–1350) consists of more than
two million tokens, providing a mostly complete collection of written records
from Early Middle High German (1050–1200) as well as a careful selection of
Middle High German texts from 1200 to 1350. The corpus was compiled in the
context of a series of projects at the Universities of Cologne, Bonn, and
Bochum, beginning in the mid-1980s.

The transcriptions of the texts comprise two separate layers. The diplomatic
layer records historical graphemes and conserves original word boundaries.
Layout information, such as page or line breaks, refers to this layer. The
second layer adapts word boundaries to the conventions of modern German and
serves as the basis for all further linguistic annotations. The texts have
been annotated with part-of-speech tags (using the HiTS tagset), morphology,
lemmas and other information. For detailed documentation (in German), see the
project website.

The corpus can be accessed via ANNIS under the following URL:

https://www.linguistics.rub.de/rem/annis/

There is also a simplified search interface available at:

https://www.linguistics.rub.de/rem/acces/simplesearch.en.html

The corpus is licensed under the Creative Commons
Attribution-ShareAlike 4.0 license (CC BY-SA 4.0), and can also be downloaded
in an XML format from our website.
 



Linguistic Field(s): Historical Linguistics
                     Text/Corpus Linguistics

Subject Language(s): German, Middle High (gmh)





 



------------------------------------------------------------------------------

*****************    LINGUIST List Support    *****************
                       Fund Drive 2016
Please support the LL editors and operation with a donation at:
            http://funddrive.linguistlist.org/donate/

        Thank you very much for your support of LINGUIST!
 


----------------------------------------------------------
LINGUIST List: Vol-28-394	
----------------------------------------------------------
Visit LL's Multitree project for over 1000 trees dynamically generated
from scholarly hypotheses about language relationships:
          http://multitree.org/








More information about the LINGUIST mailing list