[RNLD] Git for file managment.

Tom Honeyman t.honeyman at gmail.com
Tue Jul 25 22:03:59 EDT 2017

Have you considered pachyderm instead of git? Admittedly, I’m not currently using it, so it’d be an experiment unless someone else on this list pipes up with experience. Git does have it’s limitations with large static files (hence the creation of LFS), and the nature of versioning small changes in parallel to large static files can be problematic. Pachyderm was designed for versioning “data" rather than programming code, and so may be a better choice with documentation. It does have the problem of being a relatively young project. Git on the other hand is very widely used with a long-ish history.

My understanding with pachyderm is the that versioning is underlyingly stored as diffs making it more space efficient. This may be more appropriate in situations with constrained resources (e.g., remote fieldwork). It also has the advantage of (potentially) capturing (computational) transformations of data in a way that can be repeated should you rollback and make changes. So for instance, if your workflow involves transformation between transcript formats, you can repeat the transformation after a rollback to fix an error in the primary transcript.

https://protect-au.mimecast.com/s/rNKkB3t1OaOlfW?domain=pachyderm.io <https://protect-au.mimecast.com/s/rNKkB3t1OaOlfW?domain=pachyderm.io>

Then again, as John notes, all members of the team need to have the skills to use it. Git at least has some nice GUI tools for working with it, and free resources that implement it like GitHub or GitLab.


Tom Honeyman

Corpus Manager
ARC Centre of Excellence for the Dynamics of Language (CoEDL)
College of Asia and the Pacific
HC Coombs (Building 9)
The Australian National University
Canberra ACT 2601

T: +61 6125 2279
Email: tom.honeyman at anu.edu.au

CRICOS Provider # 00120C

ANU Greentip - Please consider the environment before printing this email
CAUTION: The contents of this email including any attachments are solely intended for the recipient (s) and are confidential. No part of this email is to be reproduced in any form, adapted or transmitted to anyone without the written consent of the author. If you are not the named recipient (s) please advise me by reply electronic mail that you have received this message in error, remove it from your system and destroy any printed copy. It is your responsibility to check any attachments for viruses and defects before opening.

> On 26 Jul 2017, at 3:10 am, Hugh Paterson <hugh_paterson at sil.org> wrote:
> Has anyone used git to manage files while doing language documentation? Perhaps using git's LFS feature. We have a lot of files moving across a network of computers as we acomplish various tasks in various workflows.  It would be helpful to manage diffs on these files. Annotated tiers in .eaf files, praat text grids, etc. Has anyone any pointers on this? Did they use one large repo or divide the project into several repos - perhaps based on recording sessions. Or has anyone used the git module/ tree feature? One advantage of git is the git blame, another is the rollback feature. I am hoping to use the git diff feature to check for updated sections of files. 
> all the best,
> - Hugh Paterson III

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/resource-network-linguistic-diversity/attachments/20170726/713810c6/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OutlookEmoji-1466058808618_PastedImage.png
Type: image/png
Size: 12084 bytes
Desc: not available
URL: <http://listserv.linguistlist.org/pipermail/resource-network-linguistic-diversity/attachments/20170726/713810c6/attachment-0001.png>

More information about the Resource-network-linguistic-diversity mailing list