[Corpora-List] Using version control software in corpus construction
David Graff
graff at ldc.upenn.edu
Mon Mar 29 15:34:10 UTC 2010
Just a couple comments to supplement the excellent discussion so far...
I think the issues at hand might fall under two distinct functions:
- keeping a release history for the corpus as a whole
- keeping an audit trail of changes to specific elements in the corpus
A version control system is an obvious solution for the first, while a
relational database can be a much easier solution for the second (assuming
the necessary infrastructure is in place for maintaining a DB-mediated
corpus).
A thorough and meticulous corpus manager (with adequate schedule and
budget) would want both. Others would choose one or the other based on
what matters most in the given situation.
Of course, there's a third type of issue:
- keeping track of changes in the table/XML structures that organize
the corpus
but this is just a matter of maintaining the release history and audit
trail of the database schema and/or DTD for the corpus.
Best regards,
-----------
David Graff
graff at ldc.upenn.edu
Linguistic Data Consortium
3600 Market St., Suite 810
Philadelphia, PA 19104
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list