[Corpora-List] 2nd call : DISCOURSE ORGANISATION IN THE AGE OF THE DIGITAL DOCUMENT - ATALA Workshop 22 June 04
Marie-Paule PERY-WOODLEY
pery at univ-tlse2.fr
Mon Jan 19 14:05:00 UTC 2004
2nd CALL FOR PAPERS
==================================================================================
MODELLING AND DESCRIBING DISCOURSE ORGANISATION IN THE AGE OF THE DIGITAL
DOCUMENT
==================================================================================
Workshop proposed by ATALA (Association pour le traitement automatique des
langues)-
as part of the SEMAINE DU DOCUMENT NUMERIQUE (Digital Document
Week)(<www.univ-lr.fr/sdn2004>)
La Rochelle, France, 22 juin 2004
organised by Marie-Paule Péry-Woodley,
Equipe de recherche en Syntaxe et Sémantique/Université de Toulouse-Le
Mirail (pery at univ-tlse2.fr)
The Digital Document Week aims to gather research communities dealing with
digital documents from a variety of angles: media, technical and social
modes of mediation, relation with human activity. Within this framework,the
ATALA workshop wishes to broach these questions from a linguistic point of
view, focussing on digital documents as discourse, characterised by an
internal organisation which needs to be understood and may be exploited in
computer-based systems. The workshop aims to bring together three research
areas concerned with the development of digital documents: the study of
discourse organisation, corpus linguistics, computer-based applications for
the exploitation of digital documents.
For text and discourse linguistics, the proliferation of digital documents
leads to new opportunities and new research questions, such as:
- the application of corpus analysis methods to discourse: what kind of
data can be regarded as relevant at this level of linguistic investigation?
- the development of novel ways of accessing documents, which leads to a
new emphasis on text structure and the potential exploitation of surface
markers;
- the impact of new document types on basic concepts in the field:
cohesion, coherence, metadiscursive signalling.
This workshop on written discourse organisation aims to bring together
research from three domains which must seek points of convergence in the
light of these new prospects:
1. Discourse organisation
In order to apprehend a sequence of utterances as discourse, it is
necessary to understand its organisation (to identify its segments and
perceive their hierarchy and their relations). An old and fertile tradition
approaches discourse organisation via the notion of discourse relations:
semantico-pragmatic links between segments (propositions or sets of
propositions) (cf. Péry-Woodley (ed) 2001). Other modes of organisation may
be envisaged, via the notion of theme or topic for instance, or more
recently through the discourse framing hypothesis (Charolles 1997).
Research in this field can be placed in a continuum from pure conceptual
modelling to empirical methods (automatic segmenting, cf. Hearst 1997;
shallow analyses human or automatic - cf. Teufel et Moens 1999). The
challenge is to hold both ends of the continuum in order to draw
connections between the way things are put in texts and the processes
underlying discourse organisation at different levels of granularity (local
vs. global organisation). The relationship between modelling approaches and
empirical research has often seemed problematic, with empirical studies
running the risk of losing track of structure as they focus on surface
markers, while conceptual models tend to be difficult to test empirically.
Corpus-based approaches greatly facilitated by progression into the
digital age are in the process of bringing considerable changes in the
discourse field, as they have done elsewhere in linguistics (Conrad 2002).
2. Corpus-based studies of linguistic correlates of discourse organisation
As noted by several authors (Biber et al 1998 inter alia), though research
on discourse organisation tends to make regular use of authentic data, the
corpus is often seen as a source of examples rather than the object of the
analysis as such. The implementation of a fully-fledged corpus approach
in the field of discourse organisation carries with it many difficulties:
corpus construction (common sampling-based techniques make it impossible
),
the role of quantitative analysis, and most of all definition of relevant
data making it possible to draw the connection between surface markers
(which may be just epiphenomena) and the multiple principles underlying
complex hierarchic organisation.
A gap can also be observed between linguistic approaches (low coverage and
high reliability) and numerical approaches (high coverage and low
reliability). Articulating these approaches may open new prospects, leading
to fresh insights into discourse organisation principles as well as more
operational methods for applications.
3. Computer-based systems for the exploitation of digital documents
Applications for which the relevant unit is the whole document are little
concerned by questions of discourse organisation, but those concerned with
intra-document browsing, selective synthesis or multi-level visualisation
must work their way inside the documents and therefore cannot consider them
as simple bags of words: they have to take into account the organisation
into thematic or rhetorical chunks and text architecture (cf. Luc & Virbel
2001). These objectives bring about new research questions, in particular
around the articulation of different organisational levels in long
documents (where browsing aids acquire particular relevance).
This call for papers concerns researchers who are already working on these
interactions, as well as those whose work is in one of the domains referred
to but who are interested in a dialogue with other discourse approaches.
Descriptive studies which pay specific attention to methodology will be
particularly welcome.
Some relevant themes (non-exhaustive list):
- identification of objects or text zones corresponding to text or
discourse acts (conclusions, explanations, evaluations,
)
- discourse organisation markers (from markers to relations: inductive
approach): connection, indexing (discourse frames), textual metadiscourse
- linguistic characterisation of discourse functions (from functions to
markers: deductive approach)
- segmentation (automatic or manual): topic shifts, clues to segment
boundaries (lexico-syntactic, typographical, dispositional)
- articulation between local and global organisation
- impact of discourse genre on discourse organisation and its linguistic
markers
- analysis and exploitation of document architecture
- topological approaches
- discourse annotation
SUBMISSION (MODALITIES)
A summary (2-4 pages, Word, pdf or ps) to be e-mailed by January 30th 2004
to Marie-Paule Péry-Woodley (<pery at univ-tlse2.fr>).
Notification of acceptance will be given by March 15th 2004.
***************************************************************************
References
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics:
Investigating language structure and use. Cambridge: Cambridge University
Press.
Conrad, S. (2002). Corpus linguistics approaches for discourse analysis.
Annual Review of Applied Linguistics, 22, 75-95.
Charolles, M. (1997). L'encadrement du discours : Univers, champs, domaines
et espaces (Cahier de Recherche Linguistique 6): Université de Nancy2.
Hearst, M. (1997). TextTiling: segmenting text into multi-paragraph
subtopic passages. Computational Linguistics, 23(1), 33-64.
Luc, C., & Virbel, J. (2001). Le modèle d'architecture textuelle :
fondements et expérimentation. Verbum, 23(1), 103-123.
Péry-Woodley, M.-P. (ed.) (2001). Cohérence et relations de discours à
l'écrit. Présentation. Verbum, 23(1).
Teufel S. & Moens, M. (1999). Discourse-level argumentation in scientific
articles: human and automatic annotation. In: Towards Standards and Tools
for Discourse Tagging. ACL 1999 Workshop.
___
Marie-Paule PERY-WOODLEY
___________________________________________________________________
ERSS / Sciences du Langage
Universite de Toulouse Le Mirail Tel.: 33(0)5 61 50 46 76/-36 09
5 allees Antonio-Machado Fax: 33(0)5 61 50 42 12
F-31058 TOULOUSE CEDEX Email: pery at univ-tlse2.fr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20040119/305fc08a/attachment.htm>
More information about the Corpora
mailing list