27.3165, Confs: Computational Ling, Text/Corpus Ling/USA

Wed Aug 3 17:27:41 UTC 2016

LINGUIST List: Vol-27-3165. Wed Aug 03 2016. ISSN: 1069 - 4875.

Subject: 27.3165, Confs: Computational Ling, Text/Corpus Ling/USA

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Anthony Aristar, Helen Aristar-Dry,
                                   Robert Coté, Michael Czerniakowski)
Homepage: http://linguistlist.org

*****************    LINGUIST List Support    *****************
                       Fund Drive 2016
                   25 years of LINGUIST List!
Please support the LL editors and operation with a donation at:
           http://funddrive.linguistlist.org/donate/

Editor for this issue: Kenneth Steimel <ken at linguistlist.org>
================================================================

Date: Wed, 03 Aug 2016 13:27:20
From: Carl Rubino [carl.rubino at iarpa.gov]
Subject: Machine Translation for English Retrieval of Information in Any Language Proposers Day

Machine Translation for English Retrieval of Information in Any Language Proposers Day 
Short Title: MATERIAL PD 

Date: 27-Sep-2016 - 27-Sep-2016 
Location: Washington DC Area, USA 
Contact: Carl Rubino 
Contact Email: carl.rubino at iarpa.gov 
Meeting URL: https://www.fbo.gov/index?s=opportunity&mode=form&id=b9fe325434c8c668b66b7499cf435b85&tab=core&_cview=0 

Linguistic Field(s): Computational Linguistics; Text/Corpus Linguistics 

Meeting Description: 

The Intelligence Advance Research Projects Activity (IARPA) will host a
Proposers' Day Conference for the MATERIAL Program on September 27, 2016, in
anticipation of the release of a new solicitation in support of the program. 

Program Description and Goals:

The MATERIAL performers will develop an ''English-in, English-out''
information retrieval system that, given a domain-sensitive English query,
will retrieve relevant data from a large multilingual repository and display
the retrieved information in English as query-biased summaries. MATERIAL
queries will consist of two parts: a domain specification and an English word
(or string of words) that capture the information need of an English-speaking
user, e.g., ''zika virus'' in the domain of Government vs. ''zika virus'' in
the domain of Health, or ''asperger's syndrome'' in the domain of Education
vs. ''asperger's syndrome'' in the domain of Science. The English summaries
produced by the system should convey the relevance of the retrieved
information to the domain-limited query to enable an English-speaking user to
determine whether the document meets the information needs of the query.

Current methods to produce similar technologies require a substantial
investment in training data and/or language specific development and
expertise, entailing many months or years of development. A goal of this
program is to drastically decrease the time and data needed to field systems
capable of fulfilling an English-in, English out task. Limited machine
translation and automatic speech recognition training data will be provided
from multiple low resource languages to enable performers to learn how to
quickly adapt their methods to a wide variety of materials in various genres
and domains. As the program progresses, performers will apply and adapt these
methods in increasingly shortened time frames to new languages. Program data
will include formal and informal genres of text and speech which will not be
fully captured by the training data. Image and video are out of scope for this
program.

Performers will be evaluated, relative to a baseline system, on their ability
to accurately retrieve materials relevant to an English domain-specific query
from a database of multi-domain, multi-genre documents in a low resource
language, and their ability to convey the relevance of those documents through
summaries presented to English speaking domain experts.

To develop such an end-to-end system, large multi-disciplinary teams will be
required with expertise in a number of relevant technical areas including, but
not limited to, natural language processing, low resource languages, machine
translation, corpora analysis, domain adaptation, computational linguistics,
speech recognition, language identification, semantics, summarization,
information retrieval, and machine learning. Since language-independent
approaches with quick ramp up time are sought, foreign language expertise in
the languages of the program is not expected. IARPA anticipates that
universities and companies from around the world will participate in this
research program. Researchers will be encouraged to publish their findings in
publicly-available, academic journals.

------------------------------------------------------------------------------

*****************    LINGUIST List Support    *****************
                       Fund Drive 2016
Please support the LL editors and operation with a donation at:
            http://funddrive.linguistlist.org/donate/

        Thank you very much for your support of LINGUIST!

----------------------------------------------------------
LINGUIST List: Vol-27-3165	
----------------------------------------------------------