[Corpora-List] Using version control software in corpus construction

Serge Heiden slh at ens-lyon.fr
Sun Mar 28 21:00:09 UTC 2010


Andrew,

Some french projects use version control for their corpora source files
for the reasons you mentionned.
Several use version control through the Eclipse SVN plugin integrated in
the Millefeuille XML editing platform :
http://ralyx.inria.fr/2008/Raweb/aviz/uid45.html
Others use the Oxygen XML editor integrated SVN client :
http://www.oxygenxml.com/doc/ug-oxygen/svn-client.html
Concerning usage of XML version control,  I recall an old (2003) thread
in the TEI-L mailing list about XML diff software that could be helpfull :
http://listserv.brown.edu/archives/cgi-bin/wa?A2=ind0305&L=TEI-L&P=R1880
Oxygen has good xml diff support now :
http://www.oxygenxml.com/doc/ug-oxygen/file-comparison.html
If you plan to use Subversion, Syd Bauman has written a XSLT stylesheet
that could be helpfull :
http://wiki.tei-c.org/index.php/Extract-svn-id.xslt
The eXist XML database could help as a backend for versioning :
http://exist.sourceforge.net/versioning.html
But I haven't used it myself.

Best,
Serge

Selon Hardie, Andrew le 28/03/2010 17:20:
> Hi all,
>
> I am contemplating using a source-code version control system (such as
> Subversion) to store the files of a corpus as it is being constructed,
> (a) to help keep track of changes as I go, (b) to allow several people
> to work on it in a non-confusing way and (c) to simplify backing up and
> aid data security.
>
> Using version control software occurred to me after spending some time
> manually keeping track of a set of encoding and markup changes in an
> older corpus, and finding it a total pain in the neck. Of course, this
> is not exactly what version control software is designed for...
>
> I was wondering, has anyone on the list done this before? If so, are
> there any pitfalls to avoid / particular pointers I should be aware of?
> Or alternative (better) ways of accomplishing the same thing?
>
> All hints and tips gratefully received.
>
> Best
>
> Andrew.
>
>
>
> Andrew Hardie
> Department of Linguistics
> County South
> Lancaster University
> Lancaster LA1 4YL
> United Kingdom
>
> a.hardie at lancaster.ac.uk
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>    

-- 
Dr. Serge Heiden, slh at ens-lyon.fr, http://textometrie.ens-lsh.fr
ENS de Lyon/CNRS - ICAR UMR5191, Institut de Linguistique Française
15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33(0)622003883


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list