[Corpora-List] Guidance Needed for Corpus Building
Gregor Erbach
gor at acm.org
Wed Apr 13 05:59:04 UTC 2005
Daniel,
a relevant initiative is OLAC, the Open Language Archives Community, "an
international partnership of institutions and individuals who are creating a
worldwide virtual library of language resources by: (i) developing consensus on
best current practice for the digital archiving of language resources, and (ii)
developing a network of interoperating repositories and services for housing
and accessing such resources". The website is located at
http://www.language-archives.org/
regards,
Gregor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dr. Gregor Erbach http://purl.org/net/gregor/
DFKI GmbH, Language Technology Lab http://www.dfki.de/
Tel. +49 (681) 302-5354 mailto:erbach at dfki.de
Quoting Daniel Yacob <corpora at geez.org>:
> Greetings,
>
> I'm at the very starting point of compiling an Amharic corpus
> comprised of a large number of files and word lists in my
> possession. I'm investigating starting my own project vs
> joining an existing effort.
>
> I have found lots of information from the LinguistList site
> and in particular "David Lee's Bookmarks for Corpus-based
> Linguists". However, it is a lot of info to sort thru and
> I can not evaluate well the usefulness of some resources.
> For example, the "XML Corpus Encoding Standard" looks promising
> but documentation has not changed in nearly 3 years -is it
> widely used or a dead project? The Linguistic Data
> Consortium appears to have the right goals but is also
> subscription based -? I want to keep the data available freely.
>
> I would be grateful if people here could send recommendations
> for tools to use and references for groups active in
> developing free corpus materials.
>
> thank you,
>
> Daniel
>
>
More information about the Corpora
mailing list