Specifying content of elements specifying languages

Tue Mar 25 23:54:16 UTC 2008

Dear Gary (and others),

I also wanted to re-raise an issue regarding the nature of the text  
content of elements specifying the languages of a resource (though the  
issue relates primarily to subject languages, not the description  
languages).

The resolution is given described as follows, emphasis added:

"There are two different issues here.  The first regards clarifying  
the nature of the text content. The first comment points out that  
there are two main uses of the text content (for an alternate name or  
for a variety name) and asks if there should be a way to distinguish  
these two cases. *This can be done by means of the wording of the text  
content; the most straightforward approach is to add the word  
"dialect" after the name in the case of a variety name.* An example  
like this is given in the document. Other cases of indicating a  
variety, such as "Women's speech," don't involve a name at all and so  
do not pose a problem. Still other cases of using the content (like  
the note that Heidi Johnson gives above as an example) can include  
multiple names, including both alternate names and variety names."

I was actually hoping for something stronger than what's given in the  
highlighted (with "**") sentence. It's been possible for some time to  
specify any number of fine-grained details in the element content. The  
issue that concerned me was that there seem to be a few kinds of  
language refinement likely to be sufficiently frequent that it may  
make sense to have standardized conventions for encoding them.  
"Dialect" is the most obvious one. Of course, if there aren't  
standardized codes, it would be hard for people to search for  
dialects, but it would be good if there was a standardized way for one  
to see that a resource represents some dialect. For large languages,  
people might care a bit about what dialect they're getting, for example.

I can think of two ways to do such standardization, one easier than  
the other. The easy way is just to stipulate how to say a name refers  
to a dialect in the element content. This could be as simple as saying  
the name should be followed by the word "dialect" (as opposed to, say,  
being followed be "variety" or being preceded by "dialect: "). The  
second would be to add a possible refinement attribute, let's call it,  
olac:refinement with a controlled vocabulary consisting of, for  
example, "dialect" and "alternate". Thus, we would adapt this  
guidelines example:

<dc:language xsi:type="olac:language" olac:code="ell">Saracatsan  
dialect</dc:language>

To this:

<dc:language xsi:type="olac:language" olac:code="ell"  
olac:refinement="dialect">Saracatsan</dc:language>

I don't know the DC restrictions well enough to know if this is  
appropriate. Maybe it falls under the rubric of qualified Dublin Core,  
in which case nothing can be easily done right now.

Jeff