Fwd: RCLT seminar notification -- Thursday, 27th Jan, 2011

Randy LaPolla r.lapolla at latrobe.edu.au
Wed Jan 12 00:41:45 UTC 2011

Research Centre for Linguistic Typology Seminar:
Who:     Andrew Margetts (Monash University) and Dr Anna Margetts (Monash University and RCLT)
What:    Enhancing a text collection with a document-oriented database model: a Toolbox based example (Andrew Margetts)
               Filming with native speaker commentary: making the most of filming for the community (Dr Anna Margetts)             
When:   3:30 - 5:00pm, Thursday, 27 January, 2011
Where: Reading Room, Research Centre for Linguistic Typology,
                   Building NR6, La Trobe University, Bundoora
                   Map on Melways: 19 G5
                   Map on RCLT website: http://www.latrobe.edu.au/rclt/location.htm
Enhancing a text collection with a document-oriented database model: 
a Toolbox based example
As data-sets grow in complexity it is common to expand them from a flat-file to a relational database. This approach offers many advantages: new types of questions can be asked and integrity of the data can be ensured. But there are also costs. The process of converting to this model – i.e. 'normalizing' the data – can be very involved, and the resulting data is difficult to interpret except through the database software. Speed of use can also suffer since many processes are only performed when a query is actually run.
An alternative approach is to expand the utility of the original database through simple scripts which augment the primary data-set by feeding relevant information from related sets. The result is an enriched, semi-structured document which remains readable to the human eye, yet is capable of handling complex queries comparable to those achievable through Structured Query Language (SQL) and a relational database. In fact some very complicated data relationships become easier to model than in a relational database.
This paper looks at the document-oriented model through the development of a typical Toolbox text collection. It draws on a sample Toolbox project containing interrelated data-sets, plus a set of scripts for manipulating the data. I explain the process of feeding supplementary data to the main data-set, and demonstrate some typical queries. I also discuss one situation where the model is superior to a relational database due to the intricacy of the relationships. I conclude with a brief demonstration of importing the data to MongoDB, a versatile document-oriented database.
Far from being merely a way to avoid building a full-blown relational database, this model is increasingly being used for certain large scale applications, particularly on the web. The reasons include speed of read/write operations, a more intuitive data model (which implies quicker setting up, revision and maintenance), and the ability to scale (i.e. become larger) without problems. It is much easier to translate a Toolbox database to such a system than to an equivalent relational database, and so this provides a straightforward path to exposing the data on the web and adding functionality not available in Toolbox.
Filming with native speaker commentary: 
making the most of filming for the community
Linguistic fieldworkers often feel the tension of different expectations placed on them by different parties. On one side are the demands of writing a thesis or other academic work based on the fieldwork. Time and funding for fieldwork and research is limited and there is pressure from funding agencies and universities to deliver. On the other side there is a justified expectation (typically not least by the researchers themselves) that there should be a clear benefit to the community. Making materials produced for the community valuable for linguistic research and vice versa can reduce the tension between conflicting demands and make for more productive fieldwork.
In this paper I discuss our experiences in enhancing materials created for the language community so that they were valuable as linguistic recordings. The community we were working with had a prime interest in the video documentation of certain events that were basically of no interest from a linguistic perspective. In one instance we were asked film a soccer match and we invited a speaker to provide a running commentary of the match. This strategy proved a success and provided a stream of spontaneous spoken language for analysis and a new text type for our database. It also enhanced the value of the recording for the community. We applied this technique in another context with similar success. I discuss the data collected, the limitations of this method, and the equipment and recoding set-up.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/tibeto-burman-linguistics/attachments/20110112/68a22623/attachment.htm>

More information about the Tibeto-burman-linguistics mailing list