[RNLD] What’s a good Database for recordings, transcriptions, photos etc?

Fri Jun 22 01:33:57 EDT 2018

Hi Vaso

I agree with Nick's remark that while you are deciding on a database management system (dbms), good idea to keep data in a spreadsheet, taking care to think about what data goes with what and what is dependent on what. Use separate tabs for separate things, e.g a tab for people data, a tab for 'about a language' data, a tab for 'about a place' data, etc. Most dbmses can easily upload data from spreadsheets.  However, move to a real dbms as soon as you can.

Filemaker: we currently have a Filemaker database for ethnographic data for fieldsites in southern NG (name, DOB when we can get it, clan, other affiliations, household members, languges spoken etc.). Runs on Windows and Mac. We run as a peer-to-peer network, i.e. everyone who needs to access the db has a copy of Filemaker and connects to a single copy running the database. We currently have a problem in that IT services are not used to running this kind of network so it relies on my being available to solve problems, and on keeping a dedicated always-up PC. On the database design side of things, you can implement a very sophisticated db, also very large ones (ours is very small) with multiple indexes enabling very efficient queries across the whole database.  Cons: you can implement a relational model but the implementation is a very idiosynchratic proprietry one, meaning a steep learning curve for anything realistic. Cost as I recall is about $250-$350 per seat. There are no other costs for our peer-to-peer implementaiton, though FM have other network offerings.

Microsoft Access: a much more obvious implementation of a relational db so anyone with a vague understanding of normalisation could do a useful db ('person X may speak language Y at place Z and have affiliations A, B, C' kind of thing). Import and export to other formats easy.  Design of data collection screens is easy for simple database structures, not easy for complex databases. Cons: only for Windows; it's proprietary!

MySQL: open source implementation of a relational db with SQL (=structured query language) for doing queries and updates.  Can do very large databases. Very good, free, utilities to do database management - either MySQL Workbench or PhPAdmin. You design input (data entry) and output screens in any language you like (well, a few). By far most implementaitons use the free, open source, language PhP (could use Perl or Python) and allow access over the Internet.  This brings a separate piece of software called a Web server into the picture. The industry standard Web Server is Apache, also free and open source.  So when doing database development and testing PhP code (if you are developing n PhP), whihc runs on the server, you need to run your own Web server and your own copy of the database. You do this by setting up an XAMPP environment - very easy single download package. X stands for cross-platform, i.e. runs on Linux and Windows and Mac. I currently run some MySQL databases about historical bibliogrpahic data which holds interesting events and time-dependent data which may be parallel to your lingusitic data.

All the above can store any type of data or object including text, images, videos etc. 

Why all this talk about 'relational'?  Well, it is now a very mature database structure as well as a very mature methodology for designing your database in the first place. And nowadays it is the latter - the work put in to understanding and modelling the data at the beginning stages of setting up the database - which is the strength of relational.

However, there are also 'NoSQL' databases = not (only) SQL, such as data structured around RDF triples, which have a 'smaller' structure and are therefore more easily extensible, in that they consist of atomic assertions e.g. 'Paul is-fluent-in-language Nen', 'Peter does-not-know-the-language English', (subject/predicate/object) where the subject is the thing you want to say something about (Paul/Peter), the predicate is the thing you want to say (is-fluent-in-language/does-not-know-the-language) and the object is the value (Nen/English). An rdf triple database is usually created via a LOD (linked open data) approach, i.e. you grab other people's data - any that other groups have made open and accessible - to join with yours either permanently, or for one-off queries.

Feel free to contact me if you want to discuss this stuff.

Regards

Susan          

Wellsprings of Linguistic Diversity, ANU  / Digital Humanities Research Group, Western Sydney University 
E: susan.ford at anu.edu.au  / s.ford at westernsydney.edu.au
T: (02) 9685 9891
________________________________________
From: Nick Thieberger [thien at unimelb.edu.au]
Sent: 18 June 2018 15:55
Cc: RNLD mailing list
Subject: Re: [RNLD] What’s a good Database for recordings, transcriptions, photos etc?

Hi Vaso,

This is a very topical issue. I know that IRCA is looking to develop a database for media agencies and I have spoken with a few different cultural centres and language centres who all face the same issue.

A spreadsheet is not a bad place to start, but the problem you quickly run into is keeping consistent in the way that you enter information. That's why a relational database is an advantage, so you can have a dropdown list of people, places, topics and so on from what is already in the database.

I've also seen confusion about the difference between presenting material (like in Storylines, Keeping Cultures, Ara Irititja or Mukurtu) and describing and building a catalog of material. These are two different things and a catalog is the first thing to do, it can feed the display later. Another thing to keep in mind is that none of these actually archives your records, so you still need to figure out how to make offsite backups of digital data, and digitise all analog recordings.

At the ARC Centre of Excellence for the Dynamics of Language we have a working party looking at existing databases and deciding which ones could be adapted for use in these kinds of agencies. I hope to report on results soon.

Nick

On Mon, 18 Jun 2018 at 13:10, Alan Buseman <alan_buseman at sil.org<mailto:alan_buseman at sil.org>> wrote:
I am prejudiced of course, but I think Field Linguist's Toolbox is a general database
system that can be used to organize almost anything. The database would not
contain the recordings, transcripts and photos, but it would contain references to them.
It is a free download.

I will be interested to hear if people are using it in this way. If you want help setting up a
particular database, we can help you. (Email Toolbox at sil.org<mailto:Toolbox at sil.org>).

Alan Buseman
Toolbox development and support

On Sun, Jun 17, 2018 at 9:53 PM, Vaso Elefsiniotis <vasoe at optusnet.com.au<mailto:vasoe at optusnet.com.au>> wrote:
Hi everyone, I’m keen to know what databases language centres and other collectors are using to catalogue their information. I’m aware of Microsoft access and the old FileMaker Pro which has an annual fee of about $1000 .. any suggestions? Pros and cons?
Cheers
Vaso Elefsiniotis
Consultant Linguist
Geraldton WA

Sent from my iPhone