Labelling and metadata
William J Poser
wjposer at LDC.UPENN.EDU
Sun May 2 20:06:25 UTC 2010
A few comments if you do use informative file names:
(a) as Aidan did in his example, I strongly recommend using full 4 digit
years, not two digit truncations. Someday somebody isn't going to know
which century that recording is from, and if it ever gets mixed in
with files with full years, it won't sort properly.
(b) consider separating components with separators that makes the
components identifiable, e.g. writing a date in ISO 8601 format
as 2010-05-02 rather than as 20100502.
(c) don't include spaces in file names. This works on MS Windows but
is a pain in the neck on Unix systems and makes trouble for various
kinds of software. All of the major operating systems are happy with
underscores and hyphens in file names. Most other punctuation characters
are forbidden in MS Windows and may cause trouble on other systems.
(d) use constant-width integers with zero-padding for numbering sequences,
e.g. TodaysTreasure01.wav, TodaysTreasure02.wav, ...TodaysTreasure10.wav.
This will make them sort properly on systems/using software that
do not provide hybrid lexicographic/numerical sorting.
(e) since in most cases the date will not be the information of greatest
interest, rather than putting it in the leading position, put it
later in the file name.
A possible solution to the problem noted by Claire of long informative
file names being troublesome in some circumstances is to use aliases, that
is, additional names that point at the same file. You can, for example,
create aliases suitable for your current purpose for the subset of files
you are currently working on. ("alias" is the Unix and Mac terminology.
In MS Windows aliases are called "shortcuts".)
Bill
More information about the Resource-network-linguistic-diversity
mailing list