Labelling and metadata

William J Poser wjposer at LDC.UPENN.EDU
Sun May 2 20:06:25 UTC 2010


A few comments if you do use informative file names:

(a) as Aidan did in his example, I strongly recommend using full 4 digit
    years, not two digit truncations. Someday somebody isn't going to know
    which century that recording is from, and if it ever gets mixed in
    with files with full years, it won't sort properly. 

(b) consider separating components with separators that makes the
    components identifiable, e.g. writing a date in ISO 8601 format
    as 2010-05-02 rather than as 20100502. 

(c) don't include spaces in file names. This works on MS Windows but
    is a pain in the neck on Unix systems and makes trouble for various
    kinds of software. All of the major operating systems are happy with
    underscores and hyphens in file names. Most other punctuation characters
    are forbidden in MS Windows and may cause trouble on other systems.

(d) use constant-width integers with zero-padding for numbering sequences,
    e.g. TodaysTreasure01.wav, TodaysTreasure02.wav, ...TodaysTreasure10.wav.
    This will make them sort properly on systems/using software that
    do not provide hybrid lexicographic/numerical sorting.

(e) since in most cases the date will not be the information of greatest
    interest, rather than putting it in the leading position, put it
    later in the file name.

A possible solution to the problem noted by Claire of long informative
file names being troublesome in some circumstances is to use aliases, that
is, additional names that point at the same file. You can, for example,
create aliases suitable for your current purpose for the subset of files
you are currently working on. ("alias" is the Unix and Mac terminology.
In MS Windows aliases are called "shortcuts".)
  
Bill



More information about the Resource-network-linguistic-diversity mailing list