36.3748, FYI: New Release of the Reference Corpus of Middle High German (ReM)

The LINGUIST List linguist at listserv.linguistlist.org
Fri Dec 5 17:05:02 UTC 2025


LINGUIST List: Vol-36-3748. Fri Dec 05 2025. ISSN: 1069 - 4875.

Subject: 36.3748, FYI: New Release of the Reference Corpus of Middle High German (ReM)

Moderator: Steven Moran (linguist at linguistlist.org)
Managing Editor: Valeriia Vyshnevetska
Team: Helen Aristar-Dry, Mara Baccaro, Daniel Swanson
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Editor for this issue: Daniel Swanson <daniel at linguistlist.org>

================================================================


Date: 05-Dec-2025
From: Adam Roussel [comphist at linguistics.rub.de]
Subject: New Release of the Reference Corpus of Middle High German (ReM)


We are pleased to announce the new Version 2 of the Reference Corpus
of Middle High German (ReM), which is available for download via the
project website:
  https://linguistics.rub.de/rem
The Reference Corpus of Middle High German (1050–1350) consists of
more than two million tokens, providing a mostly complete collection
of written records from Early Middle High German (1050–1200) as well
as a careful selection of Middle High German texts from 1200 to 1350.
The corpus was compiled in the context of a series of projects at the
Universities of Cologne, Bonn, and Bochum, beginning in the mid-1980s.
This new version of the corpus contains numerous corrections and
improvements, both to the tokenization and to the linguistic
annotations, as well as several new documents that were added to the
corpus.
In addition to CorA-XML, various new formats are available for
download, including TEI XML and GraphML, which, among other things, is
usable with a local ANNIS 4 instance.  There is also a JSON-based
format that contains all available annotations and provides easy
access for data analysis scripts.
The new version of the corpus can be accessed via ANNIS 4 at the
following URL:
  https://newannis.linguistics.rub.de/rem
The Reference Corpus of Middle High German is licensed under the
Creative Commons Attribution-ShareAlike 4.0 license (CC BY-SA 4.0).

Linguistic Field(s): Computational Linguistics
                     General Linguistics
                     Historical Linguistics
                     Text/Corpus Linguistics

Subject Language(s): Middle High German (ca. 1050-1500) (gmh)

Language Family(ies): German



------------------------------------------------------------------------------

********************** LINGUIST List Support ***********************
Please consider donating to the Linguist List, a U.S. 501(c)(3) not for profit organization:

https://www.paypal.com/donate/?hosted_button_id=87C2AXTVC4PP8

LINGUIST List is supported by the following publishers:

Bloomsbury Publishing http://www.bloomsbury.com/uk/

Cambridge University Press http://www.cambridge.org/linguistics

Cascadilla Press http://www.cascadilla.com/

De Gruyter Brill https://www.degruyterbrill.com/?changeLang=en

Edinburgh University Press http://www.edinburghuniversitypress.com

John Benjamins http://www.benjamins.com/

Language Science Press http://langsci-press.org

Lincom GmbH https://lincom-shop.eu/

MIT Press http://mitpress.mit.edu/

Multilingual Matters http://www.multilingual-matters.com/

Narr Francke Attempto Verlag GmbH + Co. KG http://www.narr.de/

Netherlands Graduate School of Linguistics / Landelijke (LOT) http://www.lotpublications.nl/

Peter Lang AG http://www.peterlang.com


----------------------------------------------------------
LINGUIST List: Vol-36-3748
----------------------------------------------------------



More information about the LINGUIST mailing list