[Corpora-List] Using version control software in corpus construction

David Graff graff at ldc.upenn.edu
Mon Mar 29 15:34:10 UTC 2010


Just a couple comments to supplement the excellent discussion so far...
I think the issues at hand might fall under two distinct functions:

 - keeping a release history for the corpus as a whole
 - keeping an audit trail of changes to specific elements in the corpus

A version control system is an obvious solution for the first, while a
relational database can be a much easier solution for the second (assuming
the necessary infrastructure is in place for maintaining a DB-mediated
corpus).

A thorough and meticulous corpus manager (with adequate schedule and
budget) would want both.  Others would choose one or the other based on
what matters most in the given situation.

Of course, there's a third type of issue:

 - keeping track of changes in the table/XML structures that organize
   the corpus

but this is just a matter of maintaining the release history and audit 
trail of the database schema and/or DTD for the corpus.

   Best regards,

-----------
David Graff
graff at ldc.upenn.edu
Linguistic Data Consortium
3600 Market St., Suite 810
Philadelphia, PA 19104



_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list