[Corpora-List] CFP: Special Issue of JNLE: Finite State Methods and Models in Natural Language Processing
Jakub Piskorski
jakub.piskorski at frontex.europa.eu
Fri May 7 22:44:04 UTC 2010
CALL FOR PAPERS
Natural Language Engineering
SPECIAL ISSUE ON
Finite State Methods and Models in Natural Language Processing
Special Issue Description
--------------------------------
The languages described by regular expressions are exactly those
recognized by automata that have a finite number of states (Kleene
1956). This fundamental result has been extended to string
transductions, sets of trees, formal power series, grammars,
semigroups, and finite models. Furthermore, the interest in finite
equivalence relations in these systems has led to the study of further
formal properties of recognizable languages. The study was pioneered
by Schützenberger, McNaughton, Papert and Kamp, who established the
equivalence of star-free generalized regular expressions, counter-free
automata, first-order logic and temporal logic.
Kleene's theorem, its extensions and its restrictions have tremendous
methodological relevance to speech and natural language processing
(NLP). In 1972, C. Douglas Johnson pioneered the study of formal
aspects of phonological descriptions by reducing phonological grammars
to finite state systems. This was the dawn for the currently existing
finite state approached to phonology and morphology with wide-spread
applications to written languages. In addition, many other NLP areas
admit that they employ finite state systems or their extensions. In
particular, we mention decision trees, deterministic parsing, chunk
parsing, edit distance, rational kernels, hidden Markov models,
Viterbi algorithms, semiring parsing, and weighted tree automata. In
sum, finite state methods, both statistical and knowledge-based, are
now being used in various areas of linguistic computation, ranging
from speech and character recognition to message understanding and
statistical machine translation. Finite state methods in NLP continue
to be an area of further research and growing area.
The current special issue has two goals. The first is to summarize
the state of the art in the finite state methods and models of the NLP
community. The second is to increase awareness of the community about
open issues and new perspectives: the study of the adequacy of
finite-state methods and models extends from the traditional richness
and use to new generalizations and ambitious tasks.
The call for papers is open to everyone: New contributions are warmly
encouraged; in addition, the issue is ideal *also* for extended and
updated versions of the papers presented in workshops on Finite State
Methods and Natural Language Processing (FSMNLP Helsinki 2005, Potsdam
2007, Ispra 2008, and Pretoria 2009). Submissions of completely new
papers under the special theme are welcomed as equally appropriate as
extended and updated papers.
Topics of Interest
------------------
1. New or updated work on the traditional topics of FSMNLP workshops
The traditional topics in the series of FSMNLP workshops are
appropriate. The submitted paper must clearly deal with finite
states. An extended version of a meeting paper can be submitted,
provided that the contribution has been updated as appropriate given
the progress in the field. The forums for any preliminary versions of
the paper must be indicated.
2. Study of new questions raised by fundamental results in finite
state phonology and morphology
In linguistics, morphological grammars describe the structure of word
forms in a language, and phonological grammars describe the systematic
use of sounds as units that encode the word forms. Finite state
phonology and morphology consider grammars and rules whose formal
semantics can be defined using finite state transducers. The
fundamental result in these fields states that one can construct
finite state transducers aka lexical transducers from phonological
(Johnson 1972, Kaplan and Kay 1994) and morphological grammars
(Koskenniemi 1983, Karttunen 1994, Beesley and Karttunen 2003). This
result is not the end of inquiries but it paves the way for the study
of many interesting problems, including the following:
- adaptations of the fundamental result to less rigid formalisms and
descriptive approaches used in field linguistics
- correlation between availability of computational morphology and
language development of under-resourced languages
- approaches to the adaptation of lexical transducers to a cluster of
languages
- seamless construction of computational lexicons from dynamically
changing linguistic descriptions
- model-checking and automatic verification of phonological grammars
- finite state approaches to language variation and diachronic
description
- portability and long-term archiving of resources in computational
morphology and morpho-syntax
- constructing refreshed lexical transducers quickly from updated
extended regular expressions
- lexical transducers for tonal languages from auto-segmental accounts
of spreading and alignment of articulatory features
- learning and training finite state grammars from small samples
- designs of weighted phonological and morphological grammars
- feasible finite state restrictions of optimality-theoretic and
multi-tiered phonology
- efficient finite state re-implementation of competitively efficient
ad hoc methods (see the article by S. Wintner in NLE 14(4) 2007).
3. Study of new methods with connections between languages, trees and
finite state automata
The theory of classical string automata has a natural extension to
tree automata, which found applications in NLP. In 1982, Joshi and
Levy pointed out in Computational Linguistics 8(1) that phrase
structure grammars actually generalize to tree automata that bring
more descriptive power. The link between phrase-structure grammars,
tree grammars and tree automata has enriched the study of enumerative
grammars that generate trees and has given birth to model-theoretic
grammars that instead describe trees. The study includes but is not
restricted to:
- representations of languages by tree automata and tree grammars
- non-projectivity and partial commutativity of yields of tree
automata and grammars
- hierarchies of tree automata and tree grammars
- weighted extensions of representations of languages and
transductions
- tree logics and model-theoretic syntax
- language generation through tree logics
- efficient and dynamic construction of automata from updated rules or
logical formulas
- representations of trees and automata-based query languages for tree
banks
- learning, training and minimization of representations of languages
and transductions
- semiring parsing and other generalizations of classical parsing
algorithms
- applications and case studies of methods based on the theory of tree
automata.
In 1963, Chomsky and Schützenberger gave a morphic representation for
the context-free languages i.e. the yields of local tree automata.
The representation involves bracketed strings that encode the
structure of the parse trees. Such use of bracketed strings and local
trees gives rise to methods that decompose tree automata into simpler
components many of which have regular yields. The relevant topics
include:
- representation of tree grammars and tree automata through
decompositions into constraints
- Chomsky-Schützenberger representations for tree grammars and
weighted tree grammars
- feasible constituent, dependency and alignment bracketing schemes
for grammars, tree banks and parallel corpora
- intersection grammars and conjunctive grammars with recognizable
sets of string or trees
- chart parsing, restarting automata, nested word automata and visibly
pushdown languages
- regular bracketed string languages that approximate sets of trees.
4. Study of advantages of restrictions defined in algebraic theories
of automata and languages or in finite model theory
Feasible finite state methods in natural language processing are an
application of a vast body of basic research in mathematics and
computer science. There remain, however, situations where
straightforward and general methods fail to be applicable or
efficient. It is in these situations where NLP applications motivate
interest in special families of regular relations, automata,
semirings, and formal power series. Connections between such families
and NLP tasks have not yet been fully elaborated. If we are lucky,
the study of the connections will lead to new fundamental results that
give rise to freshly identified subfamilies of finite state
methodologies in NLP. The linguistically relevant restrictions on the
power of general-purpose finite state methods might include:
- finite ambiguity
- sequential functions
- step functions
- idempotent semirings
- locally finite semirings
- descriptive complexity
- deterministic transition closure
- first-order definability
- aperiodic, counter-free and star-free sets
- unambiguous concatenation
- partial commutativity
- discounted weights
- finite synchronization delay
- limited center-embedding
- local testability
- finite bandwidth.
In addition, the topics of interest include tools that support
experiments with these restrictions and their specialized algorithms:
- new kinds of implementations of finite state compilers, libraries
and on-demand operations
- optimized algorithms for manipulation of automata or related
formalisms
- benchmarks suitable for evaluation of performance of algorithms
- methods that construct, minimize or decompose automata or
transducers.
Important Dates
---------------
- Deadline for submissions: 23 May 2010
- First decision: 10 July 2010
- Submission of revised version: 15 August 2010
- Final decision: 22 September 2010
- Submission of camera-ready versions: 23 October 2010
Submission
----------
Articles submitted to this special issue must adhere to the NLE
instructions for contributors. We encourage authors to keep their
submissions below 30 pages.
Up to date information will be available at
http://www.ling.helsinki.fi/projects/jnle/.
Editorial Board
-------------------
Guest Editors
- Anssi Yli-Jyrä (Department of Modern Languages, University of
Helsinki, Finland)
firstname.yli-jyra at helsinki.fi<mailto:firstname.yli-jyra at helsinki.fi>
- András Kornai (Budapest Institute of Technology, Hungary and
MetaCarta, Cambridge, USA)
- Jacques Sakarovitch (CNRS and Ecole Nationale Supérieure des
Télécommunications, Paris, France)
Guest Editorial Board
---------------------------
- Julie Berndsen (School of Computer Science & Informatics, University
College Dublin, Ireland)
- Francisco Casacuberta (Instituto Tecnologico De Informática,
Valencia, Spain)
- Jean-Marc Champarnaud (Université de Rouen, France)
- Jan Daciuk (Gdansk University of Technology, Poland)
- Manfred Droste (Institut für Informatik, Universität Leipzig,
Germany)
- Dafydd Gibbon (Fakultät für Linguistik und Literaturwissenschaft,
University of Bielefeld, Germany)
- Colin de la Higuera (University of Nantes, France)
- Lauri Karttunen (Palo Alto Research Center and Department of
Linguistics, Stanford University, USA)
- André Kempe (CADEGE Technologies & Consulting, France)
- Kevin Knight (Computer Science Department, University of Southern
California)
- Hans-Ulrich Krieger (DFKI GmbH, Saarbrücken, Germany)
- Marco Kuhlmann (Department of Linguistics and Philology, Uppsala
University, Sweden)
- Andreas Maletti (Universitat Rovira i Virgili, Spain)
- Stoyan Mihov (Bulgarian Academy of Sciences, Sofia, Bulgaria)
- Mark-Jan Nederhof (School of Computer Science, University of St
Andrews, Scotland)
- Kemal Oflazer (Sabanci University, Turkey)
- Jakub Piskorski (Polish Academy of Sciences, Warsaw, Poland)
- Michael Riley (Google Research, New York, USA)
- Strahil Ristov (Ruder Boskovic Institute, Zagreb, Croatia)
- Max Silberztein (Université de Franche-Comté, France)
- Bruce Watson (Dept. of Computer Science, University of Pretoria,
South Africa)
- Menno van Zaanen (Department of Communication and Information
Sciences, Tilburg University, the Netherlands)
Contact
-------
Anssi Yli-Jyrä
Department of Modern Languages
University of Helsinki
firstname (dot) lastname-with-dash (at) helsinki (dot) fi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100508/f4806aa2/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list