[RNLD] Git for file managment.

Tom Honeyman t.honeyman at gmail.com
Wed Jul 26 00:20:38 EDT 2017


Yes, you’re right. I take that back — git will not duplicate an unchanged file with each commit.

> On 26 Jul 2017, at 2:12 pm, Ivan Kapitonov <ikapitonov at student.unimelb.edu.au> wrote:
> 
> Maybe I'm missing the specific point of comparison Tom is making (and I haven't looked into pachyderm — thanks for heads-up!), but git also stores the changes as the diffs between commits, so it shouldn't be less space efficient. Given that Hugh mentions .eaf's, textgrids and the like as the objects to version control, that shouldn't be an issue anyway.
> 
> On Wed, Jul 26, 2017 at 12:03 PM, Tom Honeyman <t.honeyman at gmail.com <mailto:t.honeyman at gmail.com>> wrote:
> Have you considered pachyderm instead of git? Admittedly, I’m not currently using it, so it’d be an experiment unless someone else on this list pipes up with experience. Git does have it’s limitations with large static files (hence the creation of LFS), and the nature of versioning small changes in parallel to large static files can be problematic. Pachyderm was designed for versioning “data" rather than programming code, and so may be a better choice with documentation. It does have the problem of being a relatively young project. Git on the other hand is very widely used with a long-ish history.
> 
> My understanding with pachyderm is the that versioning is underlyingly stored as diffs making it more space efficient. This may be more appropriate in situations with constrained resources (e.g., remote fieldwork). It also has the advantage of (potentially) capturing (computational) transformations of data in a way that can be repeated should you rollback and make changes. So for instance, if your workflow involves transformation between transcript formats, you can repeat the transformation after a rollback to fix an error in the primary transcript.
> 
> https://protect-au.mimecast.com/s/9OVEB1S96EzbHl?domain=pachyderm.io <https://protect-au.mimecast.com/s/9OVEB1S96EzbHl?domain=pachyderm.io>
> 
> Then again, as John notes, all members of the team need to have the skills to use it. Git at least has some nice GUI tools for working with it, and free resources that implement it like GitHub or GitLab.
> 
> Cheers,
> 
> Tom Honeyman
> 
> Corpus Manager
> ARC Centre of Excellence for the Dynamics of Language (CoEDL)
> College of Asia and the Pacific
> HC Coombs (Building 9)
> The Australian National University
> Canberra ACT 2601
> 
> T: +61 6125 2279
> Email: tom.honeyman at anu.edu.au <mailto:tom.honeyman at anu.edu.au>
> 
> https://protect-au.mimecast.com/s/lqm9BzfaNkA5tY?domain=dynamicsoflanguage.edu.au <https://protect-au.mimecast.com/s/lqm9BzfaNkA5tY?domain=dynamicsoflanguage.edu.au>
> https://protect-au.mimecast.com/s/z4n0BMSloxVruE?domain=asiapacific.anu.edu.au <https://protect-au.mimecast.com/s/z4n0BMSloxVruE?domain=asiapacific.anu.edu.au>
> CRICOS Provider # 00120C
> 
> 
> <OutlookEmoji-1466058808618_PastedImage.png>
> 
> 
> ANU Greentip - Please consider the environment before printing this email
>  
> CAUTION: The contents of this email including any attachments are solely intended for the recipient (s) and are confidential. No part of this email is to be reproduced in any form, adapted or transmitted to anyone without the written consent of the author. If you are not the named recipient (s) please advise me by reply electronic mail that you have received this message in error, remove it from your system and destroy any printed copy. It is your responsibility to check any attachments for viruses and defects before opening.
> 
>> On 26 Jul 2017, at 3:10 am, Hugh Paterson <hugh_paterson at sil.org <mailto:hugh_paterson at sil.org>> wrote:
>> 
>> 
>> Has anyone used git to manage files while doing language documentation? Perhaps using git's LFS feature. We have a lot of files moving across a network of computers as we acomplish various tasks in various workflows.  It would be helpful to manage diffs on these files. Annotated tiers in .eaf files, praat text grids, etc. Has anyone any pointers on this? Did they use one large repo or divide the project into several repos - perhaps based on recording sessions. Or has anyone used the git module/ tree feature? One advantage of git is the git blame, another is the rollback feature. I am hoping to use the git diff feature to check for updated sections of files. 
>> 
>> all the best,
>> - Hugh Paterson III
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/resource-network-linguistic-diversity/attachments/20170726/c5435e29/attachment-0001.html>


More information about the Resource-network-linguistic-diversity mailing list