[Corpora-List] Corpora Digest, Vol 41, Issue 21
Prof. Riyad Al-Shalabi
RShalabi at aabfs.org
Sun Nov 21 10:06:22 UTC 2010
Dear Sir
my new e-mail from now on is rshalabi64 at yahoo.com or ralshalabi at uop.edu.jo
Professor Riyad Al-Shalabi
University of Petra
Amman-Jordan
________________________________________
From: corpora-bounces at uib.no [corpora-bounces at uib.no] On Behalf Of corpora-request at uib.no [corpora-request at uib.no]
Sent: Saturday, November 20, 2010 1:00 PM
To: corpora at uib.no
Subject: Corpora Digest, Vol 41, Issue 21
Today's Topics:
1. NODALIDA 2011: Second call for Workshop Proposals
(Costanza Navarretta)
2. 3rd CfP: Symposium on "Authenticating Language Learning: Web
Collaboration Meets Pedagogic Corpora" (Johannes Widmann)
3. Re: Annotation layers: missing reference (Piotr Ba?ski)
4. RE : Annotation layers: missing reference (FORT, Karen)
5. Re: RE : Annotation layers: missing reference (John K Pate)
6. Re: RE : Annotation layers: missing reference (Daniel Zeman)
7. Re: RE : Annotation layers: missing reference (Jason Eisner)
8. Re: RE : Annotation layers: missing reference
(Christian Chiarcos)
9. CFP: LRE journal - Special Issue on Analysis of short texts
on the Web (prosso at dsic.upv.es)
----------------------------------------------------------------------
Message: 1
Date: Fri, 19 Nov 2010 12:28:15 +0100
From: Costanza Navarretta <costanza at hum.ku.dk>
Subject: [Corpora-List] NODALIDA 2011: Second call for Workshop
Proposals
To: "corpora at uib.no" <corpora at uib.no>
Apologies for multiple postings
=====================
NODALIDA 2011 - Second Call for Workshop Proposals
----------------------------------------------------------
The 18th Nordic Conference of Computational Linguistics May 11-13, 2011 Riga, Latvia
The Program Committee of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011) invites proposals for well-focused workshops to be held in conjunction with the conference in Riga, Latvia. We solicit proposals on any topic of relevance for language technology, including both text and speech processing. The workshops will be held on Wednesday 11 of May, the day before the main conference.
SUBMISSION DETAILS
-------------------------
Proposals should be submitted in plain text to nodalida2011 at lumii.lv no later than December 17. The subject line should be NODALIDA 2011 WORKSHOP PROPOSAL. There is no special form, but the proposal should contain information about the following aspects of the proposed
workshop:
* Title: Title of the workshop.
* Organizers: Name, affiliation, e-mail address and brief biographical information about the workshop organizers.
* Topic/Purpose: What is the main topic of the workshop? What purpose is it going to achieve?
* Organization: How is the workshop going to be organized? Will there be an open call for papers or other contributions? Would you suggest a half or a full workshop day?
* Target groups: Who are the expected workshop participants?
MORE INFORMATION
----------------
Questions regarding the conference submissions or other issues should be sent to the conference chairs who can be reached at nodalida2011 at lumii.lv
More information about the conference and local information about Riga will be available at the conference website at http://www.lumii.lv/nodalida2011/
PROGRAM COMMITTEE
--------------------------
Bolette Sandford Pedersen (Program Chair), University of Copenhagen, Denmark Kristiina Jokinen, University of Helsinki, Finland Jussi Karlgren, Swedish Institute of Computer Science, Sweden.
Ruta Marcinkeviciene, Vytautas Magnus University, Lithuania Meelis Mihkla, Institute of the Estonian Language, Estonia Costanza Navarretta, University of Copenhagen, Denmark Anders Nøklestad, University of Oslo, Norway
Eirikur Rögnvaldsson, University of Iceland, Iceland
LOCAL ORGANIZATION COMMITTEE
----------------------------------------
Inguna Skadina (Local Chair), Institute of Mathematics and Computer Science, University of Latvia
Rihards Balodis, Institute of Mathematics and Computer Science, University of Latvia Gunta Nespore, Institute of Mathematics and Computer Science,University of Latvia
Gunta Plataiskalna, Institute of Mathematics and Computer Science, University of Latvia
Ilmars Poikans, Institute of Mathematics and Computer Science, University of Latvia
Andrejs Spektors, Institute of Mathematics and Computer Science, University of Latvia
------------------------------
Message: 2
Date: Fri, 19 Nov 2010 12:40:56 +0100
From: "Johannes Widmann" <johannes.widmann at uni-tuebingen.de>
Subject: [Corpora-List] 3rd CfP: Symposium on "Authenticating Language
Learning: Web Collaboration Meets Pedagogic Corpora"
To: "Johannes Widmann" <johannes.widmann at uni-tuebingen.de>
3rd CALL FOR PAPERS
Symposium on
“Authenticating Language Learning: Web Collaboration Meets Pedagogic
Corpora”
The symposium will take place from 17-19 February 2011 at the English
Department, University of Tübingen.
Symposium registration and abstract upload:
<http://www.ael.uni-tuebingen.de/symposium>
http://www.ael.uni-tuebingen.de/symposium
Theme of the Symposium
This event will focus on using web collaboration and pedagogic applications
of corpora in order to strengthen authentication in language learning and
teaching. Emphasis is thus on both content and communication-based learning
activities and on how computer mediated communication (CMC) and corpus
methodologies can be pedagogically integrated.
Our aim is to take stock of the current state of affairs in corpus and
CMC-based language learning and teaching. Special attention will be given to
the needs of 'young' researchers in the field. The language of the symposium
is English, but we welcome contributions for all languages.
Pathway lectures will be offered by Ana Frankenberg, Steven Thorne and Bernd
Rüschoff.
Call for Papers
We welcome proposals for 20-minute paper presentations (plus 20 minutes
discussion) or posters focusing on using corpora and CMC to support
authentication in content and communication-based language learning and
teaching.
Please submit a 300 word abstract (including a title) by 22 November 2010.
More information is available on the website.
Notifications of acceptance will be sent out during the week beginning 13
December 2010.
Organisation
This event is being organised in collaboration with EUROCALL, the European
Association for Computer-Assisted Language Learning, and the European LLP
projects "BACKBONE: Corpora for Content and Language Integrated Learning"
and "icEurope: Intercultural Communication between English Language Classes
in Europe".
Organising team:
University of Tübingen, Applied English Linguistics (Germany)
Kurt Kohn, kurt.kohn(at)uni-tuebingen.de
Claudia Warth, claudia.warth(at)uni-tuebingen.de
Johannes Widmann, johannes.widmann(at)uni-tuebingen.de
Eurocall CorpusCALL SIG
Alex Boulton, alex.boulton(at)univ-nancy2.fr
CRAPEL-ATILF/CNRS, Nancy-Université (France)
Eurocall CMC SIG
Sarah Guth, sarah.guth(at)unipd.it
University of Padua (Italy)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 10253 bytes
Desc: not available
URL: <http://www.uib.no/mailman/public/corpora/attachments/20101119/4ef87049/attachment.txt>
------------------------------
Message: 3
Date: Fri, 19 Nov 2010 12:56:13 +0100
From: Piotr Ba?ski <bansp at o2.pl>
Subject: Re: [Corpora-List] Annotation layers: missing reference
To: corpora at uib.no
Don't forget about ATLAS (Steven Bird, Mark Liberman; [1], look around
1999/2000). It is also necessary to mention the work by Nancy Ide since
the Corpus Encoding Standard[2], then, in the context of ISO TC 37 SC 4,
mostly with Laurent Romary and Keith Suderman [3]. Some attempt at
putting order into these notions has been made in Goecke et al., 2010 [4].
Good luck,
Piotr
[1]:
http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/l/Liberman:Mark.html
[2]: http://www.cs.vassar.edu/CES/
[3]: http://www.cs.vassar.edu/~ide/pubs.html
[4]: Goecke, D., Metzing, D., Lüngen, H., Stührenberg, M., Witt, A.
(2010). Different views on markup. distinguishing levels and layers. In
Linguistic modeling of information and markup languages. Contributions
to language technology. Springer Netherlands, pp. 1?21.
On 2010-11-18 14:06, Philippe Blache wrote:
> Hi Karen,
> I don't know whether the idea comes from there, but it belongs to the NXT data model:
>
> J. Carletta, S. Evert, U. Heid, J. Kilgour, J. Robertson, H. Voormann
> "The NITE XML Toolkit: Flexible annotation for multimodal language data"
> Behavior Research Methods, Instruments, & Computers 2003, 35 (3), 353-363
>
>
> Philippe
>
>
> Le 18 nov. 2010 à 11:04, Karen Fort a écrit :
>
>> Dear members,
>>
>> I'm looking for a reference on annotations layers.
>> Can somebody tell me when the "idea" of annotation layers appeared?
>>
>> I suppose it came from the speech community, but I cannot find a clear reference on that.
>>
>> Thank you for your help!
------------------------------
Message: 4
Date: Fri, 19 Nov 2010 13:30:09 +0100
From: "FORT, Karen" <Karen.FORT at inist.fr>
Subject: [Corpora-List] RE : Annotation layers: missing reference
To: Piotr Ba?ski <bansp at o2.pl>, "corpora at uib.no" <corpora at uib.no>
Yes, I know about most of those, thank you.
But it seems to me it appeared long before that in linguistics (this is confirmed by several Emails I received about this question).
As for NLP as such, the question seems more difficult to answer, but for now, I'd tend to think, like you do, that Bird & Liberman were the first to introduce it in the field, don't you think so?
Thank you for the last reference, I did not know about it!
Regards,
Karën FORT
http://www-lipn.univ-paris13.fr/~fort/
________________________________________
De : corpora-bounces at uib.no [corpora-bounces at uib.no] de la part de Piotr Ba?ski [bansp at o2.pl]
Date d'envoi : vendredi 19 novembre 2010 12:56
À : corpora at uib.no
Objet : Re: [Corpora-List] Annotation layers: missing reference
Don't forget about ATLAS (Steven Bird, Mark Liberman; [1], look around
1999/2000). It is also necessary to mention the work by Nancy Ide since
the Corpus Encoding Standard[2], then, in the context of ISO TC 37 SC 4,
mostly with Laurent Romary and Keith Suderman [3]. Some attempt at
putting order into these notions has been made in Goecke et al., 2010 [4].
Good luck,
Piotr
[1]:
http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/l/Liberman:Mark.html
[2]: http://www.cs.vassar.edu/CES/
[3]: http://www.cs.vassar.edu/~ide/pubs.html
[4]: Goecke, D., Metzing, D., Lüngen, H., Stührenberg, M., Witt, A.
(2010). Different views on markup. distinguishing levels and layers. In
Linguistic modeling of information and markup languages. Contributions
to language technology. Springer Netherlands, pp. 1?21.
On 2010-11-18 14:06, Philippe Blache wrote:
> Hi Karen,
> I don't know whether the idea comes from there, but it belongs to the NXT data model:
>
> J. Carletta, S. Evert, U. Heid, J. Kilgour, J. Robertson, H. Voormann
> "The NITE XML Toolkit: Flexible annotation for multimodal language data"
> Behavior Research Methods, Instruments, & Computers 2003, 35 (3), 353-363
>
>
> Philippe
>
>
> Le 18 nov. 2010 à 11:04, Karen Fort a écrit :
>
>> Dear members,
>>
>> I'm looking for a reference on annotations layers.
>> Can somebody tell me when the "idea" of annotation layers appeared?
>>
>> I suppose it came from the speech community, but I cannot find a clear reference on that.
>>
>> Thank you for your help!
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
------------------------------
Message: 5
Date: Fri, 19 Nov 2010 12:58:12 +0000
From: John K Pate <j.k.pate at sms.ed.ac.uk>
Subject: Re: [Corpora-List] RE : Annotation layers: missing reference
To: corpora at uib.no
On Fri, 2010-11-19 at 13:30 +0100, FORT, Karen wrote:
> Yes, I know about most of those, thank you.
> But it seems to me it appeared long before that in linguistics (this is confirmed by several Emails I received about this question).
> As for NLP as such, the question seems more difficult to answer, but for now, I'd tend to think, like you do, that Bird & Liberman were the first to introduce it in the field, don't you think so?
>
> Thank you for the last reference, I did not know about it!
The BAS Partitur format [1] was first drafted in 1995, although the link
to the draft on that page appears to be dead.
John
[1] http://www.phonetik.uni-muenchen.de/forschung/Bas/BasFormatseng.html
==
John K Pate
Student, PhD Informatics
Informatics Forum 3.35
The University of Edinburgh
http://homepages.inf.ed.ac.uk/s0930006/
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
------------------------------
Message: 6
Date: Fri, 19 Nov 2010 14:19:03 +0100
From: Daniel Zeman <zeman at ufal.mff.cuni.cz>
Subject: Re: [Corpora-List] RE : Annotation layers: missing reference
To: "FORT, Karen" <Karen.FORT at inist.fr>
Cc: "corpora at uib.no" <corpora at uib.no>
Dne 19.11.2010 13:30, FORT, Karen napsal(a):
> ...I'd tend to think, like you do, that Bird& Liberman were the first to introduce it in the field, don't you think so?
Depends on what is your definition of "the field" and of "being
introduced in the field" ;-)
Along that lines, you might also want to check the Prague School:
Sgall, Petr, Haji?ová, Eva and Panevová, Jarmila (1986): The Meaning of
the Sentence and Its Semantic and Pragmatic Aspects. Reidel: Dordrecht,
Netherlands / Academia: Praha, Czechoslovakia
(main publication I guess)
which has its roots at least in 1967 but that one is in Czech only:
Sgall, Petr (1967): Generativní popis jazyka a ?eská deklinace
(Generative Description of Language and Czech Declension). Academia:
Praha, Czechoslovakia
and which was instantiated in the three-layer annotation of the Prague
Dependency Treebank (first references about 1997; a more influential one:
Böhmová, Alena, Haji?, Jan, Haji?ová, Eva and Hladká, Barbora (2003):
The Prague Dependency Treebank: A Three-Level Annotation Scenario. In:
Anne Abeillé (ed.): Treebanks: Building and Using Syntactically
Annotated Corpora. Kluwer: Dordrecht, Netherlands ISBN 1-4020-1334-5
------------------------------
Message: 7
Date: Fri, 19 Nov 2010 08:42:35 -0500
From: Jason Eisner <jason at cs.jhu.edu>
Subject: Re: [Corpora-List] RE : Annotation layers: missing reference
To: "FORT, Karen" <Karen.FORT at inist.fr>
Cc: "corpora at uib.no" <corpora at uib.no>
On Fri, Nov 19, 2010 at 7:30 AM, FORT, Karen <Karen.FORT at inist.fr> wrote:
> As for NLP as such, the question seems more difficult to answer, but for
> now, I'd tend to think, like you do, that Bird & Liberman were the first to
> introduce it in the field, don't you think so?
>
It certainly predates the Bird & Liberman reference around 1999/2000 that
Piotr mentions.
Although this was a bit before my time, I believe that the term "standoff
markup" or "standoff annotation" was introduced by the DARPA TIPSTER program
that started in 1991. This scheme stores the document in one file and
various annotation layers in other files. These annotations may be produced
manually or automatically.
I am not sure who was responsible or the general idea or the specific
encodings used in TIPSTER --
http://www.fas.org/irp/program/process/tipster.htm suggests that
standardization began in 1994, but the ideas may have been in use in
individual TIPSTER projects before that.
Hamish Cunningham and the folks behind the GATE architecture may know more;
GATE implements the DARPA scheme, I believe.
Thompson & McKelvie (1997) is well-cited for explaining how to encode
standoff markup in SGML or XML:
http://www.ltg.ed.ac.uk/~ht/sgmleu97.html<http://www.ltg.ed.ac.uk/%7Eht/sgmleu97.html>
cheers, jason
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/html
Size: 1735 bytes
Desc: not available
URL: <http://www.uib.no/mailman/public/corpora/attachments/20101119/7afc3c4a/attachment.txt>
------------------------------
Message: 8
Date: Fri, 19 Nov 2010 15:18:38 +0100
From: "Christian Chiarcos" <christian.chiarcos at web.de>
Subject: Re: [Corpora-List] RE : Annotation layers: missing reference
To: "corpora at uib.no" <corpora at uib.no>
Dear Karen,
> As for NLP as such, the question seems more difficult to answer, but for
> now, I'd tend to think, like you do, that Bird & Liberman were the first
> to introduce it in the field, don't you think so?
Bird and Liberman's paper (apparently already circulating in 1998) is
definitely a very influential one, and may have enforced the use of the
term later on. However, a quick lookup with google scholar reveals that
"annotation layer" was an established term at this time, see, e.g., Core &
Allen (1997) or van Halteren (1998).
In fact already the "EAGLES Recommendations for the Syntactic Annotation
of Corpora" (www.ilc.cnr.it/EAGLES/pub/eagles/corpora/sasg1.ps.gz, 1996)
used the term "layer of annotation", apparently adopted from the "EAGLES
Lexicon architecture"
(http://www.ilc.cnr.it/EAGLES/pub/eagles/lexicons/lexarch.ps.gz, 1993)
where three "layers of articulation" are distinguished (morphology,
syntax, semantics). You will probably find an even older reference for
"annotation layer/layer of annotation" in early EAGLES papers.
Best,
Christian
van Halteren, H. (1998), The Feasibility of Incremental Linguistic
Annotation, Computers and the Humanities 32: 389?409, 1998.
Bird, S. and Liberman, M. (1998), Towards a formal framework for
linguistic annotations, Fifth International Conference on Spoken Language
Processing
Core, M. and Allen, J. (1997), Coding dialogs with the DAMSL annotation
scheme, AAAI Fall Symposium on Communicative Action in Humans and
Machines: 28--35
------------------------------
Message: 9
Date: Fri, 19 Nov 2010 19:27:07 +0100
From: prosso at dsic.upv.es
Subject: [Corpora-List] CFP: LRE journal - Special Issue on Analysis
of short texts on the Web
To: corpora at uib.no
Language Resources and Evaluation Journal
Special issue on Analysis of short texts on the Web
CALL FOR PAPERS
The huge volume of information available on the Web is continuously
growing. There is great interest in analyzing this information in
order to fulfil specific user needs. The challenges that researchers
must deal with when analyzing the content of Web pages are related to
the fact that quite often they are written in natural language, and
very often without any specific helpful structure. In other words, it
is a problem of processing almost pure raw data, often just short
texts which make the task quite challenging. In fact, short texts
typically contain a small number of words whose absolute frequency is
relatively low in comparison with their frequency in long documents.
This makes tasks such as text categorization harder.
The exponential growth in the number of Web documents furnishes
abundant proof of the necessity of analyzing short texts. For
instance, digital libraries and Web-based repositories of scientific
and technical information provide free access only to abstracts and
not to the full texts of the documents. News, document titles,
snippets, FAQs, chats, abstracts etc. are some examples of the high
volume of short texts available on the Web.
With the so-called Web 2.0, the largest communication and
collaborative platform, new short texts are created on daily basis as
on-line evaluations of commercial products, posts of blogs or comments
in social networks. Twitter, for instance, is a new successful social
network technology of the Web 2.0 genre which is used by millions of
people and thousands of companies to publish very short messages with
the purpose of sharing experiences and/or opinions about a product or
service. Due to the huge amount of information available in social
media, there is a clear need for mining useful information from these
messages in order to discover knowledge about the collective thinking
of the crowds. Tweet analysis is considered to be potentially very
important because comments, opinions, suggestions and complaints can
be used to define new marketing strategies or to obtain information on
companies? reputation.
In recent years there has been sufficient interest from the
computational linguistics community on the efficient analysis of short
texts. In fact, several tracks have been organized in the framework of
the different evaluation frameworks at TREC (blog and Web tracks),
CLEF (Web people search laboratory), NTCIR (opinion analysis pilot
task), INEX (ad-hoc passage retrieval task), ROMIP (track on news
clustering), and FIRE (ad-hoc task on retrieval from technical forums
and mailing lists).
This special issue aims to collect state-of-the-art contributions to
the development and use of techniques for the analysis of short texts
on the Web, with special emphasis on resources of the collaborative
platform of the Web 2.0. Thus, we welcome contributions that include,
but are not limited to, resources of short texts such as posts of
blogs, tweets, text messages, etc, as well as innovative techniques
using linguistic resources for improved understanding of mono or
multi-lingual short texts.
TOPICS OF INTEREST
We are particularly interested in articles showing the benefits of
using such resources and techniques that include, but not limited to,
the following topics:
* Categorization of short texts
* Cross-lingual short text mining on the Web
* Analysis of weblogs, tweets, text messages and snippets
* Knowledge discovery from Web 2.0
* Opinion mining in social media
* Enterprise 2.0 and market analysis
* Automatic generation of collaborative linguistic resources
* Evaluation of techniques and short text resources
IMPORTANT DATES
* Submission deadline (abstract): March 15, 2011
* Submission deadline (full paper): March 31, 2011
* First-round reviews due: May 31, 2011
* Revised versions due: July 15, 2011
* Second-round reviews due: September 15, 2011
* Final versions due: October 31, 2011
* Special issue publication: sometimes in 2012
PROGRAM COMMITTEE
Eneko Agirre, University of the Basque Country
Mikhail Alexandrov, Autonomous University of Barcelona
Enrique Alfonseca, Google Zurich
Benajiba Yassine, Philips Research North America
Andrew Borthwick, Intelius
Pavel Braslavski, Yandex
Paul Clough, University of Sheffield
José Carlos Cortizo, BrainSins
Alexander Gelbukh, National Polytechnic Institute
Alfio Massimiliano Gliozzo, IBM Watson
Julio Gonzalo, UNED
Chu-Ren Huang, The Hong Kong Polytechnic University
Hitoshi Isahara, Toyohashi University of Technology
Jaap Kamps, University of Amsterdam
Pavel Makagonov, MIxtecTechnological University
Presenit Majumder, DAIICT Gandhinagar
Antonia Martí, University of Barcelona
Patricio Martínez, University of Alicante
Rada Mihalcea, University of North Texas
Mandar Mitra, Indian Statistical Institute
Manuel Montes y Gómez, INAOE Puebla
Roberto Navigli, University of Rome La Sapienza
Boris Novikov, St. Petersburg University
Ted Pedersen, University of Minnesota
Marco Pennacchiotti, Yahoo! Labs Santa Clara
Efstathios Stamatatos, University of the Aegean
Benno Stein, Bauhaus-Universität Weimar
José Antonio Troyano, University of Seville
Dan Tufi?, Romanian Academy
Jan Wiebe, University of Pittsburgh
Xiaofang Zhou, University of Queensland
Xiaoyan Zhu, Tsinghua University Beijing
GUEST EDITORS
Paolo Rosso, Universidad Politécnica de Valencia, Spain
Marcelo Errecalde, Universidad Nacional de San Luís, Argentina
David Pinto, Benemérita Universidad Autónoma de Puebla, Mexico
SUBMISSION INFORMATION
Please follow the submission instructions available from the LRE
webpage at http://chum.edmgr.com/
For the submission of the abstract and additional information, please
contact David Pinto (dpinto at cs.buap.mx)
---
Paolo Rosso
Head of Natural Language Engineering Lab.
Dpto. Sistemas Informáticos y Computación
Universidad Politécnica Valencia
Spain
URL: http://www.dsic.upv.es/~prosso
email: prosso [at] dsic.upv.es
fax: +34 963877359
tel: +34 963877007 ext. 73571
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
----------------------------------------------------------------------
Send Corpora mailing list submissions to
corpora at uib.no
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.uib.no/listinfo/corpora
or, via email, send a message with subject or body 'help' to
corpora-request at uib.no
You can reach the person managing the list at
corpora-owner at uib.no
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Corpora digest..."
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
End of Corpora Digest, Vol 41, Issue 21
***************************************
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list