fproposed revision of format.sourcecode

Baden Hughes baden at COMPULING.NET
Mon Sep 23 07:00:36 UTC 2002


I've updated the format.sourcecode schema draft with:

-unnecessary whitespace removed
-whitespace normalized to underscores in enumeration values
-typos corrected

You can find the updated list here:

http://www.compuling.net/projects/olac/230902-draft-olac-format.sourceco
de.xsd

There's currently 285 programming languages listed on this schema. If
any one has any more to add, drop me an email.

Regards

Baden

> -----Original Message-----
> From: Steven Bird [mailto:sb at unagi.cis.upenn.edu]
> Sent: Monday, 23 September 2002 16:39
> To: baden at compuling.net
> Cc: OLAC-IMPLEMENTERS at LISTSERV.LINGUISTLIST.ORG
> Subject: Re: fproposed revision of format.sourcecode
>
>
>
> Baden Hughes <baden at compuling.net> wrote:
> > After a survey of several language archives, I'd like to
> propose some
> > possible changes to the format.sourceode schema.
> Essentially this list
> > is a list of programming languages of various types, in
> which software
> > may be written. This list includes those found at:
> > http://www.hypernews.org/HyperNews/get/computing/lang-list.html
> >
> > A draft can be found online at:
> >
> http://www.compuling.net/projects/olac/220902->
draft-olac-format.source
> > code.xsd
> >
> > Comments welcome.
>
> This is great - a 20-fold increase on the number listed in my
> original 0.4 list.  I grepped for a few obscure languages and
> they were all there.
>
> I'd like to raise two low-level technical issues,
> capitalization and whitespace.
>
> First, 99% of the codes are all-caps, even though some
> programming language names are not written like this (e.g.
> the list gives "PROLOG" but it should really be "Prolog").
> However, rather than having to settle disputes about this
> question, I'd prefer it if we case-normalized everything.
> What do people think - should we standardize on uppercase?
>
> Second, Baden's list includes many items with spaces, e.g.
> "OBJECTIVE CAML".  However, it seems desirable to limit the
> range of characters that can appear in a controlled
> vocabulary item (e.g. no accents) so that there is no
> transmission problems etc.  In some contexts, such as
> hand-crafted CGI Get requests and HTML anchors, it is a pain
> to have to manually escape the space character.  Could we
> live with a restriction of no spaces - i.e. replacing spaces
> with underscore?
>
> ** Note that neither of these issues is substantive, since
> each controlled vocabulary item will be associated with a
> human readable form (including translations into other
> languages).  For example, in Dublin Core, there is a
> refinement named "hasVersion" with the human-readable label
> "Has Version".
> [http://www.dublincore.org/documents/dcmes-> qualifiers/].
> The
> plan is to do the same thing for OLAC vocabularies.
>
> -Steven
>



More information about the Olac-implementers mailing list