[Corpora-List] Day Workshop : Modelling and describing discourse organisation in the age of the digital document

Marie-Paule PERY-WOODLEY pery at univ-tlse2.fr
Thu Dec 11 12:41:04 UTC 2003


ATALA  (Association pour le traitement automatique des langues)-
DAY WORKSHOP
as part of the SEMAINE DU DOCUMENT NUMERIQUE (Digital Document 
Week)(<www.univ-lr.fr/sdn2004>)
La Rochelle, France, 22 juin 2004

CALL FOR PAPERS

MODELLING AND DESCRIBING DISCOURSE ORGANISATION IN THE AGE OF THE DIGITAL 
DOCUMENT

workshop organised by
Marie-Paule Péry-Woodley,
Equipe de recherche en Syntaxe et Sémantique/Université de Toulouse-Le 
Mirail (pery at univ-tlse2.fr)


The Digital Document Week aims to gather research communities dealing with 
digital documents from a variety of angles: media, technical and social 
modes of mediation, relation with human activity. Within this framework,the 
ATALA workshop wishes to broach these questions from a linguistic point of 
view, focussing on digital documents as discourse, characterised by an 
internal organisation which needs to be understood and may be exploited in 
computer-based systems. The workshop aims to bring together three research 
areas concerned with the development of digital documents: the study of 
discourse organisation, corpus linguistics, computer-based applications for 
the exploitation of digital documents.

For text and discourse linguistics, the proliferation of digital documents 
leads to new opportunities and new research questions, such as:
- the application of corpus analysis methods to discourse: what kind of 
data can be regarded as relevant at this level of linguistic investigation?
- the development of novel ways of accessing documents, which leads to a 
new emphasis on text structure and the potential exploitation of surface 
markers;
- the impact of new document types on basic concepts in the field: 
cohesion, coherence, metadiscursive signalling.

This workshop on written discourse organisation aims to bring together 
research from three domains which must seek points of convergence in the 
light of these new prospects:


1. Discourse organisation

In order to apprehend a sequence of utterances as discourse, it is 
necessary to understand its organisation (to identify its segments and 
perceive their hierarchy and their relations). An old and fertile tradition 
approaches discourse organisation via the notion of discourse relations: 
semantico-pragmatic links between segments (propositions or sets of 
propositions) (cf. Péry-Woodley (ed) 2001). Other modes of organisation may 
be envisaged, via the notion of theme or topic for instance, or more 
recently through the discourse framing hypothesis (Charolles 1997). 
Research in this field can be placed in a continuum from pure “conceptual” 
modelling to empirical methods (automatic segmenting, cf. Hearst 1997; 
shallow analyses  human or automatic - cf. Teufel et Moens 1999). The 
challenge is to hold both ends of the continuum in order to draw 
connections between the way “things are put” in texts and the processes 
underlying discourse organisation at different levels of granularity (local 
vs. global organisation). The relationship between modelling approaches and 
empirical research has often seemed problematic, with empirical studies 
running the risk of losing track of structure as they focus on surface 
markers, while conceptual models tend to be difficult to test empirically. 
Corpus-based approaches  greatly facilitated by progression into the 
digital age  are in the process of bringing considerable changes in the 
discourse field, as they have done elsewhere in linguistics (Conrad 2002).


2. Corpus-based studies of linguistic correlates of discourse organisation

As noted by several authors (Biber et al 1998 inter alia), though research 
on discourse organisation tends to make regular use of authentic data, the 
corpus is often seen as a source of examples rather than the object of the 
analysis as such. The implementation of a fully-fledged “corpus approach” 
in the field of discourse organisation carries with it many difficulties: 
corpus construction (common sampling-based techniques make it impossible
), 
the role of quantitative analysis, and most of all definition of relevant 
data making it possible to draw the connection between surface markers 
(which may be just epiphenomena) and the multiple principles underlying 
complex hierarchic organisation.
A gap can also be observed between linguistic approaches (low coverage and 
high reliability) and numerical approaches (high coverage and low 
reliability). Articulating these approaches may open new prospects, leading 
to fresh insights into discourse organisation principles as well as more 
operational methods for applications.


3. Computer-based systems for the exploitation of digital documents

Applications for which the relevant unit is the whole document are little 
concerned by questions of discourse organisation, but those concerned with 
intra-document browsing, selective synthesis or multi-level visualisation 
must work their way inside the documents and therefore cannot consider them 
as simple “bags of words”: they have to take into account the organisation 
into thematic or rhetorical chunks and text architecture (cf. Luc & Virbel 
2001). These objectives bring about new research questions, in particular 
around the articulation of different organisational levels in long 
documents (where browsing aids acquire particular relevance).



This call for papers concerns researchers who are already working on these 
interactions, as well as those whose work is in one of the domains referred 
to but who are interested in a dialogue with other discourse approaches. 
Descriptive studies which pay specific attention to methodology will be 
particularly welcome.


Some relevant themes (non-exhaustive list):
- identification of objects or text zones corresponding to text or 
discourse acts (conclusions, explanations, evaluations, 
)
- discourse organisation markers (from markers to relations: inductive 
approach): connection, indexing (discourse frames), textual metadiscourse
- linguistic characterisation of discourse functions (from functions to 
markers: deductive approach)
- segmentation (automatic or manual): “topic shifts”, clues to segment 
boundaries (lexico-syntactic, typographical, dispositional)
- articulation between local and global organisation
- impact of discourse genre on discourse organisation and its linguistic 
markers
- analysis and exploitation of document architecture
- topological approaches
- discourse annotation


SUBMISSION (MODALITIES)

A summary (2-4 pages, Word, pdf or ps) to be e-mailed before January 30th 
2004 to Marie-Paule Péry-Woodley (<pery at univ-tlse2.fr>).

Notification of acceptance will be given by March 15th 2004.

***************************************************************************

Références/References

Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: 
Investigating language structure and use. Cambridge: Cambridge University 
Press.
Conrad, S. (2002). Corpus linguistics approaches for discourse analysis. 
Annual Review of Applied Linguistics, 22, 75-95.
Charolles, M. (1997). L'encadrement du discours : Univers, champs, domaines 
et espaces (Cahier de Recherche Linguistique 6): Université de Nancy2.
Hearst, M. (1997). TextTiling: segmenting text into multi-paragraph 
subtopic passages. Computational Linguistics, 23(1), 33-64.
Luc, C., & Virbel, J. (2001). Le modèle d'architecture textuelle : 
fondements et expérimentation. Verbum, 23(1), 103-123.
Péry-Woodley, M.-P. (ed.) (2001). Cohérence et relations de discours à 
l'écrit. Présentation. Verbum, 23(1).
Teufel S. & Moens, M. (1999). Discourse-level argumentation in scientific 
articles: human and automatic annotation. In: Towards Standards and Tools 
for Discourse Tagging. ACL 1999 Workshop.


___
Marie-Paule PERY-WOODLEY
___________________________________________________________________
ERSS / Sciences du Langage
Universite de Toulouse Le Mirail                Tel.: 33(0)5 61 50 46 76/-36 09
5 allees Antonio-Machado                        Fax: 33(0)5 61 50 42 12
F-31058 TOULOUSE CEDEX                  Email: pery at univ-tlse2.fr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20031211/1fe41bfd/attachment.htm>


More information about the Corpora mailing list