Appel: MODELLING AND DESCRIBING DISCOURSE ORGANISATION IN THE AGE OF THE DIGITAL DOCUMENT

alexis.nasr at LINGUIST.JUSSIEU.FR alexis.nasr at LINGUIST.JUSSIEU.FR
Mon Jan 19 19:27:24 UTC 2004


===============================================

2nd CALL FOR PAPERS:

===============================================

MODELLING AND DESCRIBING DISCOURSE ORGANISATION IN THE AGE OF THE
DIGITAL DOCUMENT

===============================================

A Workshop proposed by ATALA as part of the Digital Document Week
(http://www.univ-lr.fr/sdn2004/) La Rochelle, 22 juin 2004


organised by
Marie-Paule Péry-Woodley,
ERSS/Université de Toulouse-Le Mirail (pery at univ-tlse2.fr)



The Digital Document Week aims to gather research communities dealing
with digital documents from a variety of angles: media, technical and
social modes of mediation, relation with human activity. This ATALA
workshop wishes to broach these questions from a linguistic point of
view, focussing on digital documents as discourse, characterised by an
internal organisation which needs to be understood and may be
exploited in computer-based systems. The workshop aims to bring
together three research areas concerned with the development of
digital documents: the study of discourse organisation, corpus
linguistics, computer-based applications for the exploitation of
digital documents.

For text and discourse linguistics, the proliferation of digital
documents leads to new opportunities and new research questions, such
as:

- the application of corpus analysis methods to discourse: what kind
of data can be regarded as relevant at this level of linguistic
investigation?

- the development of novel ways of accessing documents, which leads to
a new emphasis on text structure and the potential exploitation of
surface markers;

- the impact of new document types on basic concepts in the field:
cohesion, coherence, metadiscursive signalling.

This workshop on written discourse organisation aims to bring together
research from three domains which must seek points of convergence in
the light of these new prospects:


1. Discourse organisation

In order to apprehend a sequence of utterances as discourse, it is
necessary to understand its organisation (to identify its segments and
perceive their hierarchy and their relations). An old and fertile
tradition approaches discourse organisation via the notion of
discourse relations: semantico-pragmatic links between segments
(propositions or sets of propositions) (cf. Péry-Woodley (ed)
2001). Other modes of organisation may be envisaged, via the notion of
theme or topic for instance, or more recently through the discourse
framing hypothesis (Charolles 1997). Research in this field can be
placed in a continuum from pure 'conceptual' modelling to empirical
methods (automatic segmenting, cf. Hearst 1997; shallow analyses human
or automatic - cf. Teufel et Moens 1999). The challenge is to hold
both ends of the continuum in order to draw connections between the
way 'things are put' in texts and the processes underlying discourse
organisation at different levels of granularity (local vs. global
organisation). The relationship between modelling approaches and
empirical research has often seemed problematic, with empirical
studies running the risk of losing track of structure as they focus on
surface markers, while conceptual models tend to be difficult to test
empirically. Corpus-based approaches greatly facilitated by
progression into the digital age are in the process of bringing
considerable changes in the discourse field, as they have done
elsewhere in linguistics (Conrad 2002).


2. Corpus-based studies of linguistic correlates of discourse
organisation

As noted by several authors (Biber et al 1998 inter alia), though
research on discourse organisation tends to make regular use of
authentic data, the corpus is often seen as a source of examples
rather than the object of the analysis as such. The implementation of
a fully-fledged 'corpus approach' in the field of discourse
organisation carries with it many difficulties: corpus construction
(common sampling-based techniques make it impossible?), the role of
quantitative analysis, and most of all definition of relevant data
making it possible to draw the connection between surface markers
(which may be just epiphenomena) and the multiple principles
underlying complex hierarchic organisation.  A gap can also be
observed between linguistic approaches (low coverage and high
reliability) and numerical approaches (high coverage and low
reliability). Articulating these approaches may open new prospects,
leading to fresh insights into discourse organisation principles as
well as more operational methods for applications.


3. Computer-based systems for the exploitation of digital documents

Applications for which the relevant unit is the whole document are
little concerned by questions of discourse organisation, but those
concerned with intra-document browsing, selective synthesis or
multi-level visualisation must work their way inside the documents and
therefore cannot consider them as simple 'bags of words': they have to
take into account the organisation into thematic or rhetorical chunks
and text architecture (cf. Luc & Virbel 2001). These objectives bring
about new research questions, in particular around the articulation of
different organisational levels in long documents (where browsing aids
acquire particular relevance).

This call for papers concerns researchers who are already working on
these interactions, as well as those whose work is in one of the
domains referred to but who are interested in a dialogue with other
discourse approaches. Descriptive studies which pay specific attention
to methodology will be particularly welcome.


Some relevant themes (non-exhaustive list):

- identification of objects or text zones corresponding to text or
discourse acts (conclusions, explanations, evaluations, ?)

- discourse organisation markers (from markers to relations: inductive
approach): connection, indexing (discourse frames), textual
metadiscourse

- linguistic characterisation of discourse functions (from functions
to markers: deductive approach)

- segmentation (automatic or manual): 'topic shifts', clues to segment
boundaries (lexico-syntactic, typographical, dispositional)

- articulation between local and global organisation

- impact of discourse genre on discourse organisation and its
linguistic markers

- analysis and exploitation of document architecture

- topological approaches

- discourse annotation


SUBMISSION (MODALITIES)

A summary (2-4 pages, Word, pdf or ps) to be e-mailed by January 30th
2004 to Marie-Paule Péry-Woodley (<pery at univ-tlse2.fr>).

Notification of acceptance will be given by March 15th 2004.

***************************************************************************

References

Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics:
Investigating language structure and use. Cambridge: Cambridge
University Press.

Conrad, S. (2002). Corpus linguistics approaches for discourse
analysis. Annual Review of Applied Linguistics, 22, 75-95.

Charolles, M. (1997). L'encadrement du discours : Univers, champs,
domaines et espaces (Cahier de Recherche Linguistique 6): Université
de Nancy2.

Hearst, M. (1997). TextTiling: segmenting text into multi-paragraph
subtopic passages. Computational Linguistics, 23(1), 33-64.

Luc, C., & Virbel, J. (2001). Le modèle d'architecture textuelle :
fondements et expérimentation. Verbum, 23(1), 103-123.

Péry-Woodley, M.-P. (ed.) (2001). Cohérence et relations de discours à
l'écrit. Présentation. Verbum, 23(1).

Teufel S. & Moens, M. (1999). Discourse-level argumentation in
scientific articles: human and automatic annotation. In: Towards
Standards and Tools for Discourse Tagging. ACL 1999 Workshop.


___
Marie-Paule PERY-WOODLEY
___________________________________________________________________
ERSS / Sciences du Langage
Universite de Toulouse Le Mirail       Tel.: 33(0)5 61 50 46 76/-36 09
5 allees Antonio-Machado               Fax: 33(0)5 61 50 42 12
F-31058 TOULOUSE CEDEX                 Email: pery at univ-tlse2.fr

-------------------------------------------------------------------------
Message diffusé par la liste Langage Naturel <LN at cines.fr>
Informations, abonnement : http://www.biomath.jussieu.fr/LN/LN-F/
English version          : http://www.biomath.jussieu.fr/LN/LN/
Archives                 : http://listserv.linguistlist.org/archives/ln.html

La liste LN est parrainée par l'ATALA (Association pour le Traitement
Automatique des Langues)
Information et adhésion  : http://www.atala.org/
-------------------------------------------------------------------------



More information about the Ln mailing list