27.1010, Calls: Portuguese, Comp Ling, Text/Corpus Ling, Translation/Portugal

Fri Feb 26 18:24:20 UTC 2016

LINGUIST List: Vol-27-1010. Fri Feb 26 2016. ISSN: 1069 - 4875.

Subject: 27.1010, Calls: Portuguese, Comp Ling, Text/Corpus Ling, Translation/Portugal

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Anthony Aristar, Helen Aristar-Dry, Sara Couture)
Homepage: http://linguistlist.org

*****************    LINGUIST List Support    *****************
                   25 years of LINGUIST List!
Please support the LL editors and operation with a donation at:
           http://funddrive.linguistlist.org/donate/

Editor for this issue: Anna White <awhite at linguistlist.org>
================================================================

Date: Fri, 26 Feb 2016 13:24:10
From: António Branco [Antonio.Branco at di.fc.ul.pt]
Subject: Workshop on Corpora and Tools for Processing Corpora

Full Title: Workshop on Corpora and Tools for Processing Corpora 
Short Title: WCTPC'2016 

Date: 12-Jul-2016 - 12-Jul-2016
Location: Tomar, Portugal 
Contact Person: Hilário Fontes
Meeting Email: hilario.fontes at ec.europa.eu
Web Site: http://propor2016.di.fc.ul.pt/?page_id=383 

Linguistic Field(s): Computational Linguistics; Text/Corpus Linguistics; Translation 

Subject Language(s): Portuguese (por)

Call Deadline: 15-Apr-2016 

Meeting Description:

Workshop on Corpora and Tools for Processing Corpora 
http://propor2016.di.fc.ul.pt/?page_id=383 
July 12, 2016 — Tomar, Portugal 

Co-located with PROPOR 2016 
http://propor2016.di.fc.ul.pt/ 

Motivation:

A great deal of the popularity of statistical machine translation solutions is
due to the availability of software packages that are making increasingly
easier and faster to train a working machine 
translation system. For this deployment to take place, these packages have
been seen as just requiring to be fed with a sufficiently large volume of
data, including some form of parallel corpora of raw text. 

While advances in ever more sophisticated aspects of language technology have
permitted this to become increasingly feasible, it has been left in the shadow
the fact that the data needed to feed these systems still require a
considerable deal of preparation. Given the volume of appropriate corpora
needed, this preparation can only be practical if suitable datasets are
available, on the one hand; and, on the other hand, if this preparation is
supported by a number of shallow processing tools, such as boilerplate
removers, tokenisers, orthographic normalisers, hyphenators, foreign word
detectors, inflectional analysers, etc. 

While the construction of this type of tools is no longer a hot topic for
cutting-edge research in language technology, resorting to them may turn out
to be in many cases less easy than finding and using the much more
sophisticated modules needed to deploy the machine translation systems. This
is a specially acute situation when it comes to the vast majority of
languages, which are comparatively less resourced than English in terms of
language technology, and it comes to tools performing at the state of the art
level and furthermore are openly available to be reused. 

It goes without saying that these negative circumstances go on par with and
get aggravated by the fact that suitable parallel texts are not available or
easy to obtain. Interestingly, many times such tools and 
datasets exist and yet their development has never been documented in a
publication or their availability has never been disseminated. 

Aims:

The present workshop seeks to contribute to improve on this state of affairs
by helping to map both available parallel datasets suitable to feed
statistical machine translation systems and available language processing
tools useful for their preparation. 

While pursuing this goal, the workshop seeks also to exchange ideas and
disseminate best practices that help to foster the ELRC and CEF.AT
(http://www.lr-coordination.eu) initiatives.

Call for Papers:

We thus invite submissions reporting on language resources suitable to support
statistical machine translation from/into Portuguese and on processing tools
for their preparation. Different types of presentations are possible, under
the form of an oral presentation and/or of a demonstration. While the workshop
seeks to attract and promote papers concerning language resources and tools
not yet documented in previous publications, for the sake of encompassing
representativeness, renewed 
papers on the other tools and resources are also welcome. 

Dates:

February 25: First call for papers 
March 21: Final call for papers 
April 15: Deadline for submissions 
May 16: Notification sent to authors 
June 1: Camera-ready papers ready 
July 12, 2016: Workshop takes place 

Organization Committee:

Hilário Leal Fontes, DGT — European Commission (chair) 
Paulo Batista, DGT — European Commission 
António Branco, University of Lisbon 

Programme Committee:

Hilário Leal Fontes, European Commission (co-chair) 
António Branco, University of Lisbon  (co-chair) 
Alexandru Ceausu, AMPLEXOR Luxembourg 
Aline Villavicencio, Universidade Federal do Rio Grande do Sul 
Amália Mendes, Centro de Linguística da Universidade de Lisboa 
Belinda Maia, Universidade do Porto 
Francis Tyers, Universitetet i Tromsø 
Gabriel Lopes, Faculdade de Ciências e Tecnologia, UNL 
Gorka Labaka, University of the Basque Country 
Jorge Baptista, CECL/U. Algarve and L2F-Spoken Language Lab/INESC ID Lisboa 
José Ramom Pichel Campos, imaxin|software 
Luís Trigo, LIAAD-INESC Porto L.A. 
Luísa Coheur, IST/INESC-ID Lisboa 
M.T. Carrasco Benitez, European Commission 
Maria José Machado, European Commission 
Michael Jellinghaus, European Commission 
Mikel Forcada, DLSI — Universitat d’Alacant 
Paulo Quaresma, Universidade de Évora 
Paulo Correia, European Commission 
Thiago Pardo, Universidade de São Paulo 
Xavier Gómez Guinovart, Universidade de Vigo 

Contact:

Hilário Leal Fontes, hilario.fontes at ec.europa.eu

------------------------------------------------------------------------------

*****************    LINGUIST List Support    *****************
Please support the LL editors and operation with a donation at:
            http://funddrive.linguistlist.org/donate/

----------------------------------------------------------
LINGUIST List: Vol-27-1010	
----------------------------------------------------------