[Corpora-List] PhD Studentship available: Computing Department, Lancaster University, UK
Rayson, Paul
rayson at exchange.lancs.ac.uk
Thu Mar 13 14:39:28 UTC 2008
A PhD Studentship is available in the Computing Department at Lancaster University in the UK
Software Linguistics: Grammaticalisation in Open Source Middleware
http://www.comp.lancs.ac.uk/dta2008/software-linguistics.html
The increasing complexity of distributed middleware platforms is now widely acknowledged as a major obstacle to the delivery of robust and reliable computer-based services. Modern middleware infrastructures-from on-line services to Enterprise Resource Planning-are huge distributed entities collaboratively assembled from heterogeneous technologies. They are made of parts developed by distinct teams in independent organisations, are continuously expanded and corrected, and resemble a living organism, constantly evolving in a loosely controlled manner.
Numerous architecture paradigms have been proposed to help developers address this complexity, but surprisingly little work has sought to systemise our understanding of existing middleware architectures, and of the processes that lead to their emergence. This, in turn, appears to be a key step to assess and motivate architectural paradigms in a principled manner.
The goal of this Ph.D. is to approach this problem by exploring the relationship between language change and open-source middleware architectures (such as Globus, Axis, JBoss). More precisely, the Ph.D. will investigate to which extent linguistic approaches (experimental protocols and theories) could shed light on the emergence, evolution and compliance of opensource middleware architectures.
This line of investigation is grounded in the observation that open-source middleware and natural languages both share structural and evolutionary features that call for a close analysis of how one might help understand the other. Both types of systems are created by humans, more or less organically. Like linguistics systems, middleware architectures (or for that matter any API) comprise a vocabulary (methods, class, packages), and a syntax (a socket cannot be bound before it has been created). Like middleware, natural languages are evolving systems.
Speakers introduce new idioms, new rules that sediment into acceptable practice. Language is also closely interrelated with cognitive processes and inherent to human nature, which suggests that existing language structures and linguistic processes must influence the creation of software artefacts such a middleware. Two areas of linguistics seem particularly promising to approach this relationship: Grammaticalisation Theory, and Corpus Analysis. Grammaticalisation theory originated from studies on language evolution and describes how new syntactic elements tend to emerge from "higher-level" word categories through a directional process. For instance, adverbs and prepositions across all human languages tend to arise from nouns, but almost never the other way round. Corpus Analysis, by contrast, is an experimental methodology that looks at the empirical analysis of large bodies of text through various techniques (word frequencies, collocation analysis, part-of-speech and semantic tagging), and has found applications in a wide range of disciplines (history, business, requirement engineering).
In this context, the Ph.D. would look at the following research questions:
* Can patterns be identified in documented external discussions (e-mail lists, tutorial, architectural blueprints) of well-known open-source middleware projects that mirror or inform both the structure and the nature of changes observed in the program across versions? This study could in particular seek to use Corpus Analysis and semantic tagging.
* For platforms or internal subcomponents that are already in use, does the frequency of use of API entities, and the context in which they are used influence their evolution? Do entities that are used a lot evolve more? less? Do entities that are frequently employed conjointly tend to coalesce? (collocation analysis)
* From the results of the two above points, can parallels be drawn between the emergence of and changes in API structures in middleware and some of the evolution processes documented by grammaticalisation theory. For instance: can the equivalent of 'linguistic categories' be identified to classify API building blocks (methods, parameters, attributes, classes, packages) into interrelated 'categories' that help account for API evolution across versions, as happens in natural languages?
The above questions could be further expended by linking their findings with current research on open source development in particular based on Social Network Analysis. Finally, the insights gained from this research should be used to reflect on the dominant architectural paradigms that have been proposed for middleware during the last decade.
The studentship award covers fees and an annual stipend (starting £13,500). Due to funding criteria, the studentships are available only to candidates who hold a UK passport or have been ordinarily resident for a period of 3 years immediately prior to the date of application. EU nationals can be offered tuition fees only. International students can receive the full award if they have Indefinite Leave to Remain issued by the Home Office.
Prospective applicants should have the equivalent of a BSc or MSc in Computer Science with strong evidence of an interest in Linguistics. Alternatively, s/he should possess the equivalent of a BSc or MSc in Linguistics with strong evidence of IT-related knowledge. The successful candidate will join an ambitious research group with a strong track record in development of middleware paradigms and prototypes. The research will take place within Lancaster's Computing Department, and will also benefit from the leading research performed in other
groups of the department on Natural Language Processing and Software Engineering, and in the Linguistics Department on corpus-based approaches.
Suitable candidates are encouraged to make informal inquiries to Dr. François Taïani (f.taiani at lancaster.ac.uk) and Dr. Paul Rayson (rayson at exchange.lancs.ac.uk) by providing a statement of their research interest and a brief summary of their qualifications. The application itself will comprise Application and Reference Forms, a curriculum vitae, a degree transcript, and a covering letter detailing your specific research interest, which can be submitted electronically to the Postgraduate Admissions Office (https://www.pgapps.lancs.ac.uk/).
Closing date: 15 May 2008.
Dr. Paul Rayson
Director of UCREL
Computing Department, Infolab21, South Drive, Lancaster University, Lancaster, LA1 4WA, UK.
Web: http://www.comp.lancs.ac.uk/computing/users/paul/
Tel: +44 1524 510357 Fax: +44 1524 510492
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list