[Corpora-List] Language Resources and Evaluation. Volume 48, Issue 3 available on SpringerLink

Nancy Ide ide at cs.vassar.edu
Sun Aug 31 15:59:05 UTC 2014


We are pleased to announce the electronic publication of Language Resources and Evaluation, Volume 48, Issue 3, available on SpringerLink

                             Language Resources and Evaluation
                                        Volume 48, Issue 3

Table of Contents

The Linguistic Annotation Framework: a standard for annotation interchange and
merging 
Nancy Ide & Keith Suderman

This paper overviews the International Standards Organization–Linguistic
Annotation Framework (ISO–LAF) developed in ISO TC37 SC4. We describe the XML
serialization of ISO–LAF, the Graph Annotation Format (GrAF) and discuss the
rationale behind the various decisions that were made in determining the
standard. We describe the structure of the GrAF headers in detail and provide
multiple examples of GrAF representation for text and multi-media. Finally, we
discuss the next steps for standardization of interchange formats for linguistic
annotations.

Automatic dialogue act recognition with syntactic features 
Pavel Král & Christophe Cerisara

This work studies the usefulness of syntactic information in the context of
automatic dialogue act recognition in Czech. Several pieces of evidence are
presented in this work that support our claim that syntax might bring valuable
information for dialogue act recognition. In particular, a parallel is drawn
with the related domain of automatic punctuation generation and a set of
syntactic features derived from a deep parse tree is further proposed and
successfully used in a Czech dialogue act recognition system based on
conditional random fields. We finally discuss the possible reasons why so few
works have exploited this type of information before and propose future research
directions to further progress in this area.

A dependency annotation scheme for Bangla treebank 
Sanjay Chatterji, Tanaya Mukherjee Sarkar, Pragati Dhang, Samhita Deb, Sudeshna Sarkar, Jayshree
Chakraborty & Anupam Basu

Dependency grammar is considered appropriate for many Indian languages. In this
paper, we present a study of the dependency relations in Bangla language. We
have categorized these relations in three different levels, namely intrachunk
relations, interchunk relations and interclause relations. Each of these levels
is further categorized and an annotation scheme has been developed. Both
syntactic and semantic features have been taken into consideration for
describing the relations. In our scheme, there are 63 such syntactico–semantic
relations. We have verified the scheme by tagging a corpus of 4167 Bangla
sentences to create a treebank (KGPBenTreebank).

Project Notes

Lexicon+TX: rapid construction of a multilingual lexicon with under-resourced
languages 
Lian Tze Lim, Lay-Ki Soon, Tek Yong Lim, Enya Kong Tang & Bali
Ranaivo-Malançon

Most efforts at automatically creating multilingual lexicons require input
lexical resources with rich content (e.g. semantic networks, domain codes,
semantic categories) or large corpora. Such material is often unavailable and
difficult to construct for under-resourced languages. In some cases,
particularly for some ethnic languages, even unannotated corpora are still in
the process of collection. We show how multilingual lexicons with
under-resourced languages can be constructed using simple bilingual translation
lists, which are more readily available. The prototype multilingual lexicon
developed comprise six member languages: English, Malay, Chinese, French, Thai
and Iban, the last of which is an under-resourced language in Borneo. Quick
evaluations showed that 91.2  % of 500 random multilingual entries in the
generated lexicon require minimal or no human correction.

Building the essential resources for Finnish: the Turku Dependency Treebank
Katri Haverinen, Jenna Nyblom, Timo Viljanen, Veronika Laippala, Samuel
Kohonen, Anna Missilä, Stina Ojala, Tapio Salakoski & Filip Ginter

In this paper, we present the final version of a publicly available treebank of
Finnish, the Turku Dependency Treebank. The treebank contains 204,399 tokens
(15,126 sentences) from 10 different text sources and has been manually
annotated in a Finnish-specific version of the well-known Stanford Dependency
scheme. The morphological analyses of the treebank have been assigned using a
novel machine learning method to disambiguate readings given by an existing
tool. As the second main contribution, we present the first open source Finnish
dependency parser, trained on the newly introduced treebank. The parser achieves
a labeled attachment score of 81 %. The treebank data as well as the parsing
pipeline are available under an open license at http://bionlp.utu.fi/.

Book Review 

I. Mani and J. Pustejovsky: Interpreting motion: grounded representations for spatial language 
Giovanna Marotta


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140831/e52885da/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list