query about format.sourcecode
Steven Bird
sb at UNAGI.CIS.UPENN.EDU
Mon Sep 16 22:13:15 UTC 2002
Baden Hughes <baden at compuling.net> wrote:
> I've got a query about matters related to the element format.sourcecode
Its good to see discussion of software resources for a change, and I hope
the maintainers of software archives (DFKI, TRACTOR) will contribute to
this discussion.
> Currently the spec at http://www.language-archives.org/OLAC/olacms.html
> assumes that software resources indexed by OLAC will be in source code
> (and hence appropriate entries will be made under this tagset).
Not quite - all OLAC elements are optional, and some elements are simply
inappropriate for some resources. Software distributed in binary form only
doesn't need to be given any sourcecode descriptor.
> The recommendation is currently:
>
> <format.sourcecode
> code="PROGRAMMING_LANGUAGE">Comments</format.sourcecode>
>
> There are several questions I have about this.
>
> 1) Do we need to clarify this even further as there are apparently two
> distinct options from the archive contents I've been working with). One
> is where the sourcecode requires compilation, the other is where
> sourcecode is essentially a script (or series of scripts). Any
> information about the "state" of the source code is likely to be
> inconsistent at best across archives, and I suspect even within a single
> archive. IMHO its relatively important to the end user of the OLAC
> search engine as to what state the sourcecode is in (ie how applicable
> is this code to the platforms I have access to).
Good, so the end-user requirement here is to be able to answer the
question: "Can I run this software?"
> 2) In the case where software resources indexed by OLAC are distributed
> in compiled form (ie not sourcecode) there's apparently not much more
> room to encode this information either. Apart from not strictly being
> something which belongs in a format.sourcecode element, the
> recommendation I assume would be that you could standardise this again
> by using the comment field, but the same consistency problem arises.
> Again, IMHO its relatively important to the end user of the OLAC search
> engine as to what state the sourcecode is in (ie can I just install and
> run or is it more complex)
Right, so the end-user requirement here is to be able to answer the
question: "How much effort will be required to get this running?"
> These two points may not represent large issues, but if the archives you
> are dealing with have a lot of software which ranges from source scripts
> in a range of languages, source for compilation for a range of
> compilers, and compiled "ready to run" applications, the granularity of
> this markup can be important and greatly assist with classification and
> indexation of resources in an appropriate manner. Additionally, for the
> less computer literate end users, this distinction is very important in
> them effectively locating a resource which is appropriate to their
> needs.
Absolutely. Currently we have vocabularies for Sourcecode, CPU, and OS.
However, we can modify of scrap them if they don't serve our needs for
resource description and discovery. Perhaps we need a new vocabulary
that better describes the state of the sourcecode.
One way to proceed here is for Baden (and any others) to identify the full
range of end-user requirements (is it more than these two?) then propose
vocabularies that best serve these requirements...
-Steven
--
Steven.Bird at ldc.upenn.edu http://www.ldc.upenn.edu/sb
Assoc Director, LDC; Adj Assoc Prof, CIS & Linguistics
Linguistic Data Consortium, University of Pennsylvania
3600 Market St, Suite 810, Philadelphia, PA 19104-2653
More information about the Olac-implementers
mailing list