[Corpora-List] Using version control software in corpus construction

Rainer Ottmueller efidetum at googlemail.com
Tue Mar 30 10:40:03 UTC 2010


As your primary concern seems to be querying afterwards, in opposite
to a version control system, a database is recommended. An arbitrary
relational DB is suitable, but only if you need to store plain text.
Right, you can store annotations as additional attributes. I
understood that you need to store annotated text. This information is
not accessible for SQL  On the other hand, most relational DBs have
also the option to handle XML-text (for instance). In case you only
have to store text, depending on your queries afterwards, maybe SQL is
sufficient (depends on your requirements). Anyway, disk space does
hardly matter. Also, should you be so glad that your requirements are
covered just by SQL, many stable clients are available (e.g., to
administer ad model the DB).

Rainer


On 29 March 2010 23:11, maxwell <maxwell at umiacs.umd.edu> wrote:
> On Mon, 29 Mar 2010 20:45:50 +0100, Darren Pearce-Lazard
> <darren.pearce at sussex.ac.uk> wrote:
>> I assume that the use case you are referring to is 'undoing' a mod to
> some
>> part of the corpus. If this is what you need, you can merge the
> changeset
>> from one revision to another to the current version of the repository
> thus
>> effectively (in theory) undoing the change that an arbitrary span of
>> revisions represents. I can explain this in more detail if you want.
>
> I would very much like to know how to do this, but I suspect we'll get
> even more msgs from people wanting to leave Corpora-list if we do that
> on-line.  However, surely this has been written up somewhere.  Can you
> perhaps point us to an explanation?
>
>   Mike Maxwell
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list