Labelling and metadata on the hodge-podge of recordings on your home computer

Peter Austin pa2 at SOAS.AC.UK
Sun May 2 14:50:19 UTC 2010


I don't believe there is any one "how we're supposed to do it ". There are
lots of alternatives, each with strengths and weaknesses.

>From my observations of practices by researchers I have spoken to, there are
usually two extremes in file naming that people adopt: (1) the "stuff as
much information in as you can" approach Greg was suggesting, or (2) the
"minimal semantics" approach that Aiden described. Approach (1) is not a
good idea since the file names are unintelligible to anyone who doesn't have
the key and there is always some other semantics that COULD have been
stuffed into the file name but wasn't, eg. time of recording, speaker's
gender, speaker's date of birth etc. Approach (2) results in file labels
that provide so little information that they cannot easily be searched, or
scanned by humans.

Claire's solution is to include SOME semantics that she will find useful,
but to keep it minimal. What semantics is included is up to the user to
decide, but I would argue for a little more explictness rather than Claire's
abbreviations that only make sense to her. So, instead of BA-AKL_001.wav it
might be better to use bardi_akliff_001.wav (or if you happen to like ISO
codes bcj_akliff_001.wav. Mixing upper and lower case can cause problems on
some systems; never put spaces in file names and never use non-ASCII
characters in file names.

As for encoding metadata, that can be done in a txt file, or an xls
spreadsheet (or a database), and done at various (folder) levels, depending
on the researcher's preferences. David Nathan's chapter on archiving in the
recently published LDD 7 presents some examples of various alternatives that
have been used by depositors to ELAR at SOAS. The important thing is to
establish a system that is transparent and explicit and follow it
rigorously. Oh, and tell other people about it.

Peter

On 3 May 2010 00:11, Claire Bowern <clairebowern at gmail.com> wrote:

> I know this is how we're supposed to do it but I have problems
> identifying recordings quickly when the filename is a long string of
> numbers (e.g. with 5 Elan search windows open). I use this system for
> field methods tapes but I really don't like it.
> My field tapes are done by language and researcher initial (e.g.
> BA-AKL) and then numbered sequentially (apart from the Yan-nhangu
> fieldtrip where I used date/session numbering...). I find that much
> easier to use.
> Claire
>
> On Sun, May 2, 2010 at 7:31 AM, Aidan Wilson <aidan.wilson at sydney.edu.au>
> wrote:
> > Hey Greg,
> >
> > I'd actually leave most of the stuff in your filename to a metadata file
> and
> > leave the filename like:
> > 20100405-01.wav, .eaf, .mp3, whatever,
> > And have a spreadsheet of metadata for a bunch of recordings, where you
> keep
> > information like recordist, speaker, language, location, as well as date
> and
> > a rough breakdown of contents. It's probably a good idea to also have
> this
> > stuff in a text file; one per file, but also in a general spreadsheet for
> > all files:
> > filename,       date,   language,       recordist,      speaker,
> >  location
> > 20100405-01,    2010-04-05,     Marra,  gd,     fr,     Ngukurr
> > 20100405-02,
> > etc.,
> >
> > While it's a good idea to try and keep as much identifying information in
> > the filename, it can look cluttered, and it may tempt you from proper
> > collection of metadata in a spreadsheet. Also, without a list of
> > abbreviations to inform someone looking at your files how to interpret
> the
> > elements in your filename, it may go to waste.
> >
> > -Aidan
> >
> > --
> > Aidan Wilson
> >
> > The University of Sydney
> > +612 9036 9558
> > +61428 458 969
> > aidan.wilson at usyd.edu.au
> >
> > On Sun, 2 May 2010, Greg Dickson wrote:
> >
> >> Hello,
> >> I'm trying to tidy up my files on my home laptop, which I've only ever
> >> used
> >> secondarily to whatever computer I was assigned by various workplaces.
> >>  Over
> >> about four years, I've ended up with a real hodge-podge of recordings
> and
> >> files in all kinds of languages made in all kinds of situations by all
> >> kinds
> >> of people even! (When you lend out your Zoom recorder it can come back
> >> with
> >> interesting things on it!).  I thought it's time for a spring clean.
> >>
> >> I'm pretty decided on a way to label my files consistently, but would
> >> appreciate any feedback or shared experiences.
> >>
> >> I thought I'd go with something like:
> >>
> >> 100405MARfrNGUgd01
> >>
> >> Which is DATE (April 5, 2010) LANGUAGE (Marra) speaker (initials: fr)
> >> LOCATION (Ngukurr) "recorded by" (gd = me) Series number (1st in the
> >> series)
> >>
> >> And then any ELAN, metadata, video or text files will have the same
> name,
> >> just a different file extension.
> >>
> >> I'm wondering though, what should I do about metadata?  What do others
> do?
> >>  How necessary is keeping metadata for such a miscellaneous collection
> of
> >> files?  And how do I do it?  One place I worked at just kept a store of
> >> .txt
> >> files of metadata - 1 file for each recording.  Is that a good way?
> >>
> >> Any help or info appreciated.
> >>
> >> Guda mingi,
> >> (That's all now)
> >>
> >> Greg Dickson
> >>
> >> PO Box 2468
> >> Katherine NT 0852
> >> Ph: 8971 0207 / 0427 391 153
> >> Email: munanga at bigpond.com
> >>
> >>
> >>
> >>
> >>
> >
>



-- 
Prof Peter K. Austin
Marit Rausing Chair in Field Linguistics
Department of Linguistics, SOAS
Thornhaugh Street, Russell Square
London WC1H 0XG
United Kingdom

web: http://www.hrelp.org/aboutus/staff/index.php?cd=pa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/resource-network-linguistic-diversity/attachments/20100503/c71b8c6c/attachment.html>


More information about the Resource-network-linguistic-diversity mailing list