17.3652, FYI: 2nd CfA: Towards a Reference Corpus of Web Genres

Mon Dec 11 16:11:30 UTC 2006

LINGUIST List: Vol-17-3652. Mon Dec 11 2006. ISSN: 1068 - 4875.

Subject: 17.3652, FYI: 2nd CfA: Towards a Reference Corpus of Web Genres

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
Reviews: Laura Welcher, Rosetta Project / Long Now Foundation  
         <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Amy Renaud <renaud at linguistlist.org>

To post to LINGUIST, use our convenient web form at


Date: 11-Dec-2006
From: Marina Santini < santinim at inwind.it >
Subject: 2nd CfA: Towards a Reference Corpus of Web Genres 

-------------------------Message 1 ---------------------------------- 
Date: Mon, 11 Dec 2006 10:56:37
From: Marina Santini < santinim at inwind.it >
Subject: 2nd CfA: Towards a Reference Corpus of Web Genres 


2nd Call for Abstracts

Corpus Linguistics 2007 Colloquium: 
Towards a reference corpus of web genres


Genres of spoken and written texts are being intensively studied from
various angles, e.g., communication studies, discourse analysis,
computational linguistics, without arriving at a generally accepted
definition. The web is new, so it is not clear how to apply traditional
notions of genre to web pages. In this colloquium we would like collect
submissions that study characteristics of web genres with respect to
traditional paper genres represented in electronic corpora like the BNC. 
Web documents are often characterised by a high level of genre hybridism,
by a fragmentation of textuality across several documents, by the impact of
technical features such as hyperlinking, posting facilities and
multi-authoring. The web is a huge reservoir of documents that can be
easily mined for building all sorts of corpora with many collections being
built according to subjective criteria for
corpus composition, genre annotation, genre representativeness and genre
granularity. In this colloquium we would like to invite submissions
contributing to a reference corpus of web genres. The main goal of the
colloquium is to draw up an initial list of characteristics and
requirements for building, annotating and evaluation reference corpora of
web genres. For instance:
* To what extent should genre hybridism and authorial creativity be
represented in a genre collection? These two phenomena appear to be very 
common on the web.

* To what extent is it possible to include ''emerging genres'', i.e.,
genres still in a transitional phase in genre evolution? The web is
currently thriving with emerging genres.

* How many granularities of the unit of analysis should be included? Only
genres representing web sites? Only genre representing web pages? Both? 

* What ''format'' should be used to store these units in a collection
(e.g., a database-like form, DOM trees, a net of graphs, in HTML format, in
a text-only version, with or without embedded images, removing boilerplate

* What level of genre granularity and similarity should be applied in the
reference corpus? Genre collections often show different levels of  
granularity, including genres and super-genres. Should similar genres, such
as ''tutorial'' and ''how-to'', be accounted for separately? 
The topics of interest include but are not limited to:

- Text theory for the development of web corpora
- Modelling corpora of web genres
- Innovative genre classification schemes accounting for multi-genre and
no-genre web documents
- Modelling genre annotation scheme for web documents (metadata organization)
- Assembling a list of web genres for a reference corpus
- Creating comparable corpora of web genre 
- Automatic genre classification vs. human genre classification
- How to evaluate the corpus: using statistical measures, relying on corpus
linguists, librarians, or web users?

The aim of this colloquium, the first ever organized on this topic, is to
bring together researchers from different communities such as corpus
linguistics, genre analysis, digital genre community, computational
linguistics, and information retrieval in order to promote the discussion
and development of new ideas and methods to create new corpora for language
studies and as evaluation resources.

Send abstrats to: webgenres at googlemail.com
Please, specify ''Colloquium Abstracts'' in the subject line.
Abstract submissions should include: 
     * Presenter contact information (mailing address, phone, e-mail & fax) 
     * A paper proposal (250 word max) 
     * An abstract for the program (50 word max) 
The deadline for submissions is Dec 15, 2006
Notification of acceptance will be sent out by Jan 11, 2007
The colloquium will take place in the UK at the end of July 2007. The venue
and the exact date of the colloquium will be announced at the end of
January 2007.
Colloquium Organization:

Marina Santini (University of Brighton, UK)
Serge Sharoff (University of Leeds, UK)

Program Committee:
Marco Baroni (University of Bologna, Italy)
Stefan Gries (University of California, USA)
Adam Kilgarriff (Lexmasterclass, UK)
Alexander Mehler (Bielefeld University, Germany)
Sven Meyer zu Eissen (University of Weimar, Germany)
John Paolillo (Indiana University, USA)
Paul Rayson (UCREL, Lancaster Uni, UK)
Georg Rehm (University of Tuebingen, Germany)
Marina Santini (University of Brighton, UK)
Serge Sharoff (University of Leeds, UK)
Benno Stein (University of Weimar, Germany)
Main contact: Serge Sharoff (s.sharoff at leeds.ac.uk)
Other contact: Marina Santini (Marina.Santini at itri.brighton.ac.uk) 

Linguistic Field(s): Text/Corpus Linguistics

LINGUIST List: Vol-17-3652	


More information about the Linguist mailing list