[Corpora-List] Call for participation: NooJ Workshop, Corpus Linguistics 2007
Va1radi Tama1s
varadi at nytud.hu
Fri Jul 6 10:50:13 UTC 2007
******* Call for participation ***************
NooJ Workshop
A sophisticated finite-state linguistic analysis tool for corpora
Corpus Linguistics 2007 Conference
Birmingham University
27 July, 2007
http://www.corpus.bham.ac.uk/conference2007/
Workshop Organizers:
Max Silberztein, Université Franche-Compte,
max.silberztein at univ-fcomte.fr
Tamás Váradi, Hungarian Academy of Sciences
varadi at nytud.hu
NooJ website: http://nooj4nlp.net
Aim:
The aim of this 2-hour workshop is to give an overview of the NooJ corpus processing tool.
The workshop will focus on the linguistic aspects of corpus annotation and querying.
Topic:
NooJ is a comprehensive linguistic development environment written for the Windows .NET
platform. It allows linguists to construct large-coverage linguistic resources in the form of
dictionaries and grammars, and provides tools for their maintenance: contracts, debuggers,
etc. NooJ is also a corpus processor that can apply these resources to large texts in order to
annotate them, build sophisticated concordances, analyse complex syntactic and semantic
phenomena, and retrieve and extract information from them.
The main attraction of NooJ for the ordinary corpus linguist is the ease with which the
complex functionalities can be handled. This is particularly evident in the graphical interface
through which sophisticated cascading finite-state grammars can be developed. The finite-
state capabilities are substantially enhanced through the use of variables, typed features and
inheritance mechanisms etc.
Format and content:
The workshop will consist of four 20-minute presentations and an interactive "how-to"
session.
-- The first presentation (by M. Silberztein) will be an overview of the design philosophy and
the architecture of the system, including a discussion of the main components, the most
important functionalities and some possible applications.
-- The second presentation (by Slim Mesfar) will give a run-down of how NooJ can be applied
in the daily routine tasks of corpus analysis: how to import a corpus and annotate it, how to
build complex concordances from simple and complex queries, and how to export results in
an XML document.
-- The third presentation (by Tamas Varadi) will focus on the linguistic capabilities of NooJ's
local grammars which are enhanced finite-state transducers using features and variables.
Issues discussed will also include the writing, compiling and prioritized deployment of
dictionaries and grammars, and disambiguation at various levels.
-- The third presentation (by Kata Gabor) will present a robust rule-based syntactic parser
for Hungarian built with NooJ. Hungarian produces a challenge for parsers because of its
extremely complex morphology (with the inevitable heavy morphosyntactic ambiguity)
coupled with free constituent order within the clause. The presentation will outline the
architecture of the grammar as well as its application in annotating corpora.
The session will conclude with an interactive part during which members of the audience will
be encouraged to raise "how-to do it?" type of questions, wich the presenters will try to
answer on the spot, through quick demonstration, if possible.
More information about the Corpora
mailing list