[Corpora-List] Guidance Needed for Corpus Building

Daniel Yacob corpora at geez.org
Wed Apr 13 07:59:51 UTC 2005


Greetings,

I'm at the very starting point of compiling an Amharic corpus
comprised of a large number of files and word lists in my
possession.  I'm investigating starting my own project vs
joining an existing effort.

I have found lots of information from the LinguistList site
and in particular "David Lee's Bookmarks for Corpus-based
Linguists".  However, it is a lot of info to sort thru and
I can not evaluate well the usefulness of some resources.
For example, the "XML Corpus Encoding Standard" looks promising
but documentation has not changed in nearly 3 years -is it
widely used or a dead project?  The Linguistic Data
Consortium appears to have the right goals but is also
subscription based -?  I want to keep the data available freely.

I would be grateful if people here could send recommendations
for tools to use and references for groups active in
developing free corpus materials.

thank you,

Daniel



More information about the Corpora mailing list