[Corpora-List] Guidance Needed for Corpus Building

Wed Apr 13 05:59:04 UTC 2005

Daniel,
a relevant initiative is OLAC, the Open Language Archives Community, "an
international partnership of institutions and individuals who are creating a
worldwide virtual library of language resources by: (i) developing consensus on
best current practice for the digital archiving of language resources, and (ii)
developing a network of interoperating repositories and services for housing
and accessing such resources". The website is located at
http://www.language-archives.org/

regards,

   Gregor

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dr. Gregor Erbach                     http://purl.org/net/gregor/
DFKI GmbH, Language Technology Lab    http://www.dfki.de/
Tel. +49 (681) 302-5354               mailto:erbach at dfki.de

Quoting Daniel Yacob <corpora at geez.org>:

> Greetings,
>
> I'm at the very starting point of compiling an Amharic corpus
> comprised of a large number of files and word lists in my
> possession.  I'm investigating starting my own project vs
> joining an existing effort.
>
> I have found lots of information from the LinguistList site
> and in particular "David Lee's Bookmarks for Corpus-based
> Linguists".  However, it is a lot of info to sort thru and
> I can not evaluate well the usefulness of some resources.
> For example, the "XML Corpus Encoding Standard" looks promising
> but documentation has not changed in nearly 3 years -is it
> widely used or a dead project?  The Linguistic Data
> Consortium appears to have the right goals but is also
> subscription based -?  I want to keep the data available freely.
>
> I would be grateful if people here could send recommendations
> for tools to use and references for groups active in
> developing free corpus materials.
>
> thank you,
>
> Daniel
>
>