[Corpora-List] Using version control software in corpus construction

Darren Pearce-Lazard darren.pearce at sussex.ac.uk
Mon Mar 29 19:45:50 UTC 2010


>
> > * Is it possible to compress internal files in SVN or other systems?
> (What I
> > mean is that SVN would take care of compression of the internal files in
> the
> > repository but check-in/check-out works with plain text files)
>
> SVN uses zlib to compress all data in your repository. However, your
> working copy is almost always exactly twice the size of the current
> revision. It's quite safe to say that all other modern common version
> control systems (git, mercurial..) are more space efficient than SVN,
> if that matters to you.
>

I could be wrong but my understanding is that git downloads the entire
repository history so it is *much* larger than Subversion in terms of the
size of the working copy. :-) I don't know about Mercurial.


> > * Is it possible to remove specific revisions or even to restrict the
> > history to a specific number of revisions? (but I'm not sure if this
> would
> > be a good idea anyway)
>
> As far as I know, the only way to do that is to dump an SVN repository
> as a plain file, manually remove the revision data, and re-import the
> edited file. Yikes.
>

I agree with 'yikes'! Removing specific revisions would mean that the
history could become inconsistent or produce strange effects. In fact, I'm
not even sure it's possible since svndumpfilter allows filtering out of
paths within the repository but not (AFAIK) skipping of specific revisions
since what would this mean for the modifications it entailed?

I assume that the use case you are referring to is 'undoing' a mod to some
part of the corpus. If this is what you need, you can merge the changeset
from one revision to another to the current version of the repository thus
effectively (in theory) undoing the change that an arbitrary span of
revisions represents. I can explain this in more detail if you want.

:Darren.

-- 
----------------------------------------------------------------------
 :Darren :Pearce-Lazard
----------------------------------------------------------------------
 *** Shop & Donate: http://buy.at/campuskids ***
----------------------------------------------------------------------
 darrenp at dcs.bbk.ac.uk
 Postdoctoral Researcher
 London Knowledge Lab, University of London
----------------------------------------------------------------------
 darrenp at sussex.ac.uk
 Visiting Research Fellow
 Informatics, University of Sussex
 http://www.informatics.sussex.ac.uk/users/darrenp/
----------------------------------------------------------------------
 darren.pearce at gmail.com
 http://www.linkedin.com/in/darrenpearce
----------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100329/920434c1/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list