<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=UTF-8" http-equiv="Content-Type">

  <title></title>

</head>

<body bgcolor="#ffffcc" text="#660000">

<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

<title></title>

<font size="+1">dear Greg, dear all</font><font size="+1">,<br>

<br>

Useful thread indeed. <br>

I am especially curious about the contrast

suggested in the earlier discussion, between trying to include

semantics in filenames, vs using opaque </font><big>filenames

and then search a database.<br>

<br>

The reason is, during the last decade, I have experienced the two ends

of the spectrum, and I'm not sure where I should stand now.  <br>

<br>

For many years, I had taken the habit of naming my audio files with

maximally informative (and therefore rather long) names, such as:<br>

</big>

<ul>

  <li><b><font color="#6600cc"><big>BD04-24 Veraa Harold ch Jesus

mtp-vrs.wav</big></font></b></li>

  <li><b><font color="#6600cc"><big><font color="#cc0000">DD04-13

Lovoko Mamuli leg Laperus2 tnm.wav</font><br>

    </big></font></b></li>

  <li><b><font color="#006600"><big>ED10-30 Yaqane Edwad-Bilis conv

chamanisme hiw.wav</big></font></b></li>

</ul>

<blockquote>[NB:  At that time I would use spaces in filenames (I'm not

doing this anymore), but this can be easily changed to underscore with

some file utility.  Sometimes I even used non-Ascii characters, I

confess! ]<br>

</blockquote>

<big>These file names would begin with a unique alphanumerical ID, so

that the chronological order of recordings would </big><big>be

easily retrieved by automatic sorting.  The other reason for starting a

long file name with an id, was that, should some software truncate the

filename to the first 8 characters, it would still remain unique.<br>

Here is how my (customised) system worked:<br>

</big>

<ul>

  <li><big>first letter is a code for a whole collection = a single

fieldtrip [A for my first fieldtrip, B for my second.... F for my 6th];

    <br>

    </big></li>

  <li><big>second letter is a code for the support (D for digital audio

recording, P for photo, V for video…)</big></li>

  <li><big>then 2 digits for a subcollection (in the olden days this

was the

number of a minidisc);  This subcollection ID is also the name of the

folder in the folder-tree.<br>

    </big></li>

  <li><big>then hyphen plus 2 digits for item in this subcollection

(never more than 99)</big></li>

</ul>

<big>and then the <i>Homo Sapiens</i>-friendly stuff came in:</big>

<ul>

  <li><font color="#6600cc"><b><big>location of recording</big></b></font><big>,

spelled out — </big><big>usually a village in Vanuatu: e.g. Veraa

(=Vera'a, a village in Vanua Lava), Yaqane

(a hamlet in Hiw);  or in the Solomons (Lovoko, Vanikoro);<br>

   <br>

    </big></li>

  <li><font color="#6600cc"><b><big>name of main speaker</big></b></font><big>,

    </big><big>spelled out</big><big><br>

("Harold"; "Mamuli"; "Edwad-Bilis" as this was a conversation between

two men);  <br>

names also uttered in full in the recording itself.<br>

  <br>

    </big></li>

  <li><big><font color="#6600cc"><b>genre of recording</b></font>,

using a limited set of abbreviations:  <br>

    </big><big>ch= chant (song), </big><big>ct='conte' (tale), leg</big><big>='legend',

    </big><big>conv='conversation', </big><big>etc.<br>

  <br>

    </big></li>

  <li><big>a very <font color="#6600cc"><b>short title</b></font>:  <br>

"Jesus" (a church song on someone with a name like this); <br>

"Laperus2" (the legend of Lapérouse's wreckage — second version by same

speaker that day); <br>

"chamanisme" (a conversation on shamanism);<br>

  <br>

    </big></li>

  <li><big>a 3-letter <font color="#6600cc"><b>id for the language</b></font>

    <br>

=> Very useful as several languages can be spoken in the same

village, and sometimes </big><big> the very same person would tell me

the same story in 2 different languages</big><big>.  <br>

    </big><big>e.g. tnm=Tanema, hiw=Hiw;  <b>mtp-vrs</b>= Mwotlap and

Vurës, because this church song was exceptionally mixing the two

languages. </big><br>

    <big><small>[I'm not using ISO codes because they are opaque, and

poorly designed for my area; but the equivalence between the codes I

use and

ISO codes is made easily accessible in my publications & <a

 moz-do-not-send="true"

 href="http://alex.francois.free.fr/AF-field.htm#Vanuatu">homepage</a>

anyway.]</small><br>

    </big> </li>

</ul>

<big>Admittedly some info is missing, e.g. my own name, or the date: 

but the date is usually retrievable from the collection &

subcollection, and I always uttered it orally in the recording itself.

Maybe one day I should hardcode it in the filename.  <br>

<br>

These (relatively) transparent long names have proven very useful to me

as I was working on all these files, whether to transcribe them,

compare different versions of similar stories, or whatever.  Because I

have 1150 different sound files in my corpus, it proved also convenient

to perform automatic search queries on filenames, say, to easily

retrieve all recordings with the same storyteller over the years, or to

filter all recordings of the same language.  I don't know if I would

recommend such a system (maybe not) but at least I found it convenient

for myself: the file name says it all. The good thing was

also that most of these filenames were easily interpretable to people

other than myself, with a minimal amount of abbreviations or codes. 

The initial id (BD04-24…) doesn't really need to be interpreted anyway

(it's an id), but the village & speaker's names (+title) are

explicit, and a simple Txt file can help make sense of language names

or genres (and collections).  In parallel I've always used spreadsheet

for metadata, with full name of speaker, their age, precise

location, date, full name, etc.<br>

<br>

And then a few years ago, I wanted to archive these hundreds of files

into our open archive<small> (LACITO's <a moz-do-not-send="true"

 href="http://lacito.vjf.cnrs.fr/archivage/presentation_en.htm">Archivage</a>)</small>.

<br>

When they saw these long file names, our IT people were horrified. 

They insisted that they should all be shortened to a simple id, as

short as possible, getting rid of all the semantics.  They thought it

would be much more convenient, or more elegant perhaps, to handle

filenames like "<font color="#6600cc"><b>AF03-05-02.wav</b></font>" <small><i>[AF03=my

initials + 3rd field trip, etc.]</i></small>, coupled with some

metadata file. Fair enough, they were surely right.  (my earlier use of

spaces and occasionally non-Ascii was probably at fault, together with

the sheer length of each string).<br>

<br>

So I created a copy of my 1150 audio files, and renamed them all

(manually) with these elegant numbers, which are now opaque even to

myself.  Took me ages (weeks? months?).  In parallel I would fill a

metadata sheet for each item, and send it to the IT people for them to

encode in Xml/Xsl format onto the server. </big><big>(I didn't know

Xml/Xsl/Php well enough to create the search interface myself.) </big><big>This

was several years ago, and it never became as convenient as I was

hoping it

would be. In fact a fair part of the metadata is still

awaiting to be format-converted & transferred to a new server,

which was stopped halfway due to shortage in funding… but this is

another

story.<big><br>

</big><br>

In the meantime, I now have my whole audio archives (37 Gb) in two

versions: exactly the same sound files, but one set has the old

filenames, one has the numbers. This is very silly, and was meant to be

temporary, yet has lasted for some reason. <br>

Finally what happens is, every time I want to </big><big>quickly </big><big>retrieve

a file from my archives, I basically have the choice between accessing

the set of files with the long, transparent names which are </big><big>visually

readable, </big><big>easily searchable, and instantly clickable  <br>

—  OR accessing my metadata spreadsheet, try and identify the string of

digits which I'm looking for, write it down, then try and access the

recording among hundreds of files, essentially in a non-automatic way. 

Now guess which solution I end up choosing.  (*grin*)<br>

<br>

</big><font size="+1">There's probably

something I've done wrong (as always) but I'm still wondering what the

ideal combination would be.  It seems that different usages (working on

one's own files vs long-term archiving…) may warrant different

decisions, but of course this is not a good answer to Greg.<br>

I am especially trying to

identify the best procedure in terms of archiving for the future, and

making access easy for other prospective users.<br>

<br>

regards,<br>

Alex.<br>

</font><font size="+1"><font face="Book Antiqua"><br>

</font></font>

<hr size="2" width="100%"><br>

Margaret Carew wrote:

<blockquote

 cite="mid:2F09C791CDBCC84AA29EEB6EA86D7E33F692D5@aspexbe.batchelor.edu.au"

 type="cite">

  <pre wrap="">Useful thread, and I am now looking back at my various drives with one eyebrow raised...

I'm wondering, what is the role of folders in all this?

I have an almost well organised system of audio recordings that is in the main not archived (although carefully backed up!), from various years and places. I have established a folder for each year that has passed since I commenced recording in digital (ie. 2006 2007 etc). Within each of these year folders is a recording session folder with a name that includes the year and month (sometimes day) the place and the event or key topic. Within each of these secondary folders are the recordings that are part of that session, with a date, speaker and other semantic info (eg. 20100209_BP_kurdu_wita.WAV). The metadata files (marked up text files) are stored within each folder, and the name of the folder is entered as a field in the metadata.

Like my erstwhile colleague Greg I'm probably closer to the hodge-podge end of things, doing lots of recordings with students, sometimes in a bit of a random fashion, multi-tasking like crazy, yet trying to keep some order in it. I'm now wondering whether the folder based system is going to be a problem when it comes to archiving - one thing that has popped up is the existence of these lots of folder based metadata files - this might need to be consolidated into one file.

I might also add that I've become fond of using itunes to make playlists of recordings - usually edited ones - and to use as a secondary database (a kind of partial mirror if you like). You can use the file info to point back to the folderised filenames as described. And it's great for making CDs for students of their recordings, to repatriate materials quickly etc. Also good for compiling files that will be used in a resource (eg. a set of clips for a voiceover) Am I committing an archiving crime by using itunes in this way?

Regards

Marg Carew

-----Original Message-----

From: Claire Bowern [<a moz-do-not-send="true"

 class="moz-txt-link-freetext" href="mailto:clairebowern@gmail.com">mailto:clairebowern@gmail.com</a>]

Sent: Tue 04/05/2010 00:49

To: David Nathan

Cc: Resource-Network-Linguistic-Diversity

Subject: Re: Labelling and metadata

David, that would work at the end of the documentation (in fact I'm

doing something pretty close to that right now for One Arm Point

School for Bardi stories) but while working on the collection, doing

searches, transcribing, etc, I'm constantly using the underlying

files, and I'm not sure that creating another layer of reference would

solve the problem. It would be useful for managing collections where

there are several numbering systems though (e.g. I have tapes that

have 3 references - the AIATSIS archive tape number, the internal

collection number, and the number they'd get if I put them in my

scheme...)

Claire

On Mon, May 3, 2010 at 6:58 AM, David Nathan <a moz-do-not-send="true"

 class="moz-txt-link-rfc2396E" href="mailto:dn2@soas.ac.uk"><dn2@soas.ac.uk></a> wrote:

  </pre>

  <blockquote type="cite">

    <pre wrap="">Dear all

About the filenames, there are some excellent suggestions in this

thread, but I think that there is a tendency to conflate the function

of filenames as identifers with the functions that enable retrieval

and access to resources. This conflation remains invisible only while

we all keep imagining that documentation materials are merely "data" -

without some genres, granularities, interface considerations etc. that

relate to the presentation and usage of the resources. In that sense,

you might think (even hypothetically) of the interface by which you

might wish people to access them, and it is probably likely to be some

kind of link. As those familiar with HTML and related technologies

know, a link has a target as well as a "display text" (and other

possible attributes in semantic web formalisms). Translating this back

to one's local data management, there seems a good case for separating

out the two functions mentioned above, and thinking about a simple

linking system (that you can implement easily in spreadsheet pages, or

HTML), and then the relevant considerations for what you want the

"display text" to be - for yourself, and, quite possibly differently,

for other users. This might help resolve out the different issues that

are most relevant for each function in your contexts.

best wishes

David

At 18:11 03/05/2010, you wrote:

    </pre>

    <blockquote type="cite">

      <pre wrap="">If you are going to include semantics in the file names can I make a plea that your labels are a little more transparent -- why not use:

fm_2009_session10_audio.wav

fm_2009_session10_video.wav

rather than FM09_v10A ?? v could stand for "version" or "volume" or who knows what else, and, as for "A", well that's anyone's guess. Also, if the "09" is a year then write it as >2009 (one might even argue for "felicity" or "meakins" rather than "FM"). I recommend separators like _ as well, as Bill Poser did in his contribution to this discussion. Note also, >that if you have more than 99 video sessions you'll need the label to be:

fm_2009_session010_audio.wav

I think there are good reasons for being a little more explicit in file names if you want to put in some (useful) semantics like this -- after all YOU know what "FM" "09" "v" "A" mean >but who else could guess? Compare that with:

felicity_2009_session10_video.wav

Best,

Peter

      </pre>

    </blockquote>

    <pre wrap="">On 3 May 2010 18:19, Felicity Meakins <a

 moz-do-not-send="true" class="moz-txt-link-rfc2396E"

 href="mailto:f.meakins@uq.edu.au"><f.meakins@uq.edu.au></a> wrote:

This is a good point, particularly if you use two recorders (e.g. audio

recorded plus video camera) to record the same session. I use 'v' and 'a' to

distinguish these. In this respect, it is the recording _session_ that's

primary, not the actual recording.

FM09_v10A

FM=me

09=year (full date is in metadata)

v=video

10=recording session

A=part of recording session

e.g. recording session may have taken place at X place but over two hours we

recorded 3 stories A, B, C.

On 3/5/10 6:13 PM, "Joe Blythe" <a moz-do-not-send="true"

 class="moz-txt-link-rfc2396E" href="mailto:blythe.joe@gmail.com"><blythe.joe@gmail.com></a> wrote:

    </pre>

    <blockquote type="cite">

      <pre wrap="">The only two cents worth I'd like to add to this discussion is that I had to

modify my numbering numbering system to indicate whether the original

recording was made with a video or dedicated audio recorder. I only mark the

video ones as "vid".

Thus video files might be

20100503JBvid01.mov

Because you sometimes need to extract audio files from video files the video

file, such an extracted audio file would be

20100503JBvid01.wav

This ensures that any files recorded on the same date from a dedicated audio

recorder (e.g., 20100503JBv01.wav) don't end up with the same file name.

Joe</pre>

    </blockquote>

    <pre wrap="">--

Prof Peter K. Austin

Marit Rausing Chair in Field Linguistics

Department of Linguistics, SOAS

Thornhaugh Street, Russell Square

London WC1H 0XG

United Kingdom

web: <a moz-do-not-send="true" class="moz-txt-link-freetext"

 href="http://www.hrelp.org/aboutus/staff/index.php?cd=pa">http://www.hrelp.org/aboutus/staff/index.php?cd=pa</a>

-------------

David Nathan

Endangered Languages Archive

SOAS

-------------</pre>

  </blockquote>

</blockquote>

<hr size="2" width="100%">

<pre class="moz-signature" cols="72">Dr Alex FRANÇOIS

LACITO - CNRS, France

2009-2011:  Visiting Fellow

        Dept of Linguistics

        School of Culture, History and Language

        Australian National University

        ACT 0200, Australia

        <a moz-do-not-send="true" class="moz-txt-link-freetext"

 href="http://alex.francois.free.fr">http://alex.francois.free.fr</a>

</pre>

<br>

</body>

</html>