Typological databases and the reading public

Fri Apr 20 11:24:52 UTC 2007

Typological databases and the reading public, especially that reading LT
(from the Editorial Board of LT)

Databases are databases, and journal articles are journal articles. 
You compile a database -- on your own or in collaboration, which may 
include sharing data from separate databases -- in order to find out 
something new, or also in order to confirm or disconfirm claims or 
theories that have been around.  If you do find out something that 
you consider worth making public, you try to get it published in a 
journal (or book) or you publicise it otherwise (on your web site or 
in a letter that you send around to your friends and foes, thereby 
perhaps reaching a wider audience).

Naturally, those interested in your findings may also want to 
ascertain that what you publicly claim you have found is valid, in 
terms of both (i) your own data and (ii) other evidence.  You should 
therefore be prepared to make your own data publicly available, at 
least if challenged.  I'm not sure this implies that all one's data 
(i.e., full databases) ought to be published together with one's 
findings:  sometimes this may be sensible and viable;  but it may 
also suffice if one's data can be inspected upon special request 
(from the reading public, or from reviewers asked to decide on the 
publication of one's findings).

While the data in databases, in flux or complete, await analysis and 
theorising it may be useful to share them (and there are initiatives 
to improve communication in the typological database scene);  having 
them published in a journal or book would not seem to serve any 
obvious purpose.  (Although the line may be hard to draw, the 
publication of text collections would seem to serve purposes which 
the publication of typological databases doesn't.)

Databases, then, are tools:  interesting primarily for what you can 
do with them -- furthering knowledge, in our case knowledge about 
linguistic diversity and unity.

Since there is an obvious communal interest in having the best tools 
possible and in these tools being used in the most expert way 
possible, tool design and tool use are questions of considerable 
interest for everybody keen on furthering knowledge.  They are 
important questions which merit informed and prompt discussion in 
scholarly journals, whether old-fashioned easy-to-read p or 
new-fashioned technology-intensive e, whether specialising in some 
limited field of empirical enquiry or generally dedicated to 
questions of the methodology and philosophy of science.

Were it not the whole point of this message, it would almost be 
needless to add that, for linguistic typology, LT, this field's 
dedicated journal, continues to warmly invite scholarly information 
about, and scholarly debate on, typological methodology, obviously 
including database methodology.

As to the different question, also raised in this current lingtyp 
exchange, whether typologists are to be trusted with data, and 
generally are good for anything, readers of LT may rest assured, and 
writers in LT will confirm, that editors and reviewers for this 
journal have always seen to it, to the best of their individual and 
collective abilities, that specific data, as well as specific 
analyses and specific ideas, are properly credited.  (Sure, the line 
may sometimes be hard to draw between what is specific and needs to 
be attributed and what is or has become common knowledge.  But that 
is another question.)

Frans Plank

********

I think it should be pointed out that the international child 
language research community has functioned with a open, accessible 
database for decades.  There's a standardized format for submission, 
programs for search through transcripts, ethical guidelines, etc. 
The typology community is far behind, and could learn from this. 
Here's the url:  http://childes.psy.cmu.edu/

These are the ground principles:

The basic principle behind TalkBank is that researchers would like to 
share their data, because they think they are important and can 
interest others. However, apart from this basic consideration there 
are several additional reasons to share data and some reasons not to 
share data.

First, the reasons to share data:
  Principles of scientific integrity require that ideas be put to a 
test. In order to test your ideas about your data, you need to open 
them up to others who will either support or challenge your ideas.
  Some types of claims can only be tested against large data sets or 
against comparisons of somewhat similar data sets. To make these 
analyses, we often need more and more data.
  Much of the work in science is conducted using public funds. We have 
an obligation to the public to make maximally efficient use of these 
data. For example, the NIH has now issued 
<http://grants.nih.gov/grants/policy/data_sharing/index.htm> 
guidelines on this issue.
  But there are also two reasons not to share data:
  Data should not be shared if you have not secured informed consent 
from your subjects.
  	Untenured faculty should not share data until they have 
published the basic findings. In reality, we have never seen a case 
in which a person's ability to publish findings has been limited by 
contributing data. In any case, this is only a concern for faculty 
without tenure.

Dan Slobin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lingtyp/attachments/20070420/25f0cbd9/attachment.htm>