[Corpora-List] Call for participation: NooJ Workshop, Corpus Linguistics 2007

Va1radi Tama1s varadi at nytud.hu
Fri Jul 6 10:50:13 UTC 2007


******* Call for participation ***************

	NooJ Workshop
	A sophisticated finite-state linguistic analysis tool for corpora

	Corpus Linguistics 2007 Conference
	Birmingham University
	27 July, 2007
	http://www.corpus.bham.ac.uk/conference2007/


	Workshop Organizers: 
	Max Silberztein, Université Franche-Compte,
	max.silberztein at univ-fcomte.fr
	
	Tamás Váradi, Hungarian Academy of Sciences
	varadi at nytud.hu
	
	NooJ website: http://nooj4nlp.net

Aim: 
The aim of this 2-hour workshop is to give an overview of the NooJ corpus processing tool. 
The workshop will focus on the linguistic aspects of corpus annotation and querying.

Topic: 
NooJ is a comprehensive linguistic development environment written for the Windows .NET 
platform. It allows linguists to construct large-coverage linguistic resources in the form of 
dictionaries and grammars, and provides tools for their maintenance: contracts, debuggers, 
etc. NooJ is also a corpus processor that can apply these resources to large texts in order to 
annotate them, build sophisticated concordances, analyse complex syntactic and semantic 
phenomena, and retrieve and extract information from them.

The main attraction of NooJ for the ordinary corpus linguist is the ease with which the 
complex functionalities can be handled. This is particularly evident in the graphical interface 
through which sophisticated cascading finite-state grammars can be developed. The finite-
state capabilities are substantially enhanced through the use of variables, typed features and 
inheritance mechanisms etc.

Format and content: 
The workshop will consist of four 20-minute presentations and an interactive "how-to" 
session.

-- The first presentation (by M. Silberztein) will be an overview of the design philosophy and 
the architecture of the system, including a discussion of the main components, the most 
important functionalities and some possible applications.

-- The second presentation (by Slim Mesfar) will give a run-down of how NooJ can be applied 
in the daily routine tasks of corpus analysis: how to import a corpus and annotate it, how to 
build complex concordances from simple and complex queries, and how to export results in 
an XML document. 

-- The third presentation (by Tamas Varadi) will focus on the linguistic capabilities of NooJ's 
local grammars which are enhanced finite-state transducers using features and variables. 
Issues discussed will also include the writing, compiling and prioritized deployment of 
dictionaries and grammars, and disambiguation at various levels.

-- The third presentation (by Kata Gabor) will present a robust rule-based syntactic parser 
for Hungarian built with NooJ. Hungarian produces a challenge for parsers because of its 
extremely complex morphology (with the inevitable heavy morphosyntactic ambiguity) 
coupled with free constituent order within the clause. The presentation will outline the 
architecture of the grammar as well as its application in annotating corpora.

The session will conclude with an interactive part during which members of the audience will 
be encouraged to raise "how-to do it?" type of questions, wich the presenters will try to 
answer on the spot, through quick demonstration, if possible.



More information about the Corpora mailing list