<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

</head>

<body bgcolor="#ffffff" text="#000099">

<p class="MsoPlainText" style="text-align: justify;"><span

 style="font-size: 12pt; font-family: "Times New Roman";" lang="EN-US"></span><span

 style="font-size: 12pt; font-family: "Times New Roman";" lang="EN-US">[apologies

for cross-postings]</span></p>

<p class="MsoPlainText" style="text-align: justify;"><br>

CALL FOR PAPERS<br>

<b>Workshop on Language Resources (LRs) and Human Language Technologies

(HLT) for Semitic Languages - Status, Updates, and Prospects<br>

</b><br>

To be held in conjunction with the 7th International Language Resources

and Evaluation Conference (LREC 2010)<br>

</p>

<p class="MsoPlainText" style="text-align: justify;"><b>17 May 2010,

Mediterranean Conference Centre, Valetta, Malta</b><br>

</p>

<p class="MsoPlainText" style="text-align: justify;"><b>Deadline for

submission: 26 February 2010<br>

</b><br>

<br>

Description<br>

<br>

The Semitic family includes languages and dialects spoken by a large

number of native speakers (around 300 million). Prominent members of

this family are Arabic (and its varieties), Hebrew, Amharic, Tigrinya,

Aramaic, Maltese and Syriac. Their shared ancestry is apparent through

pervasive cognate sharing, a rich and productive pattern-based

morphology, and similar syntactic constructions.  In addition, there

are several languages which are used in the same geographic area such

as Amazigh or Coptic, which, while not Semitic, have common features

with Semitic languages, such as borrowed vocabulary.<br>

<br>

The recent surge in computational work for processing Semitic

languages, particularly Modern Standard Arabic (MSA) and Modern Hebrew

(MH), has brought modest improvements in terms of actual empirical

results for various language processing components (e.g., morphological

analyzers, parsers, named entity recognizers, audio transcriptions,

etc.). Apparently, reusing existing approaches developed for English or

French for processing Semitic language text/speech, e.g., Arabic

parsing is not as straightforward as initially thought. Apart from the

limited availability of suitable language resources, there is

increasing evidence that Semitic languages demand modeling approaches

and annotations that deviate from those found suitable for

English/French. Issues such as the pattern-based morphology, the

frequently head-initial syntactic structure, the importance of the

interface between morphology and syntax, and the difference between

spoken and written forms (especially in Colloquial Arabic(s)) exemplify

the kind of challenges that may arise when processing Semitic

languages. For language technologies, such as information retrieval and

machine translation, these challenges are compounded by sparse data and

often result in poorer performance than for other languages.<br>

<br>

This Workshop intends to follow on topics of paramount importance for

Semitic-language NLP that were discussed at previous events (LREC,

MEDAR/NEMLAR Conferences, the workshops of the ACL Special Interest

Group for Semitic languages, etc.) and which are worth revisiting. <br>

<br>

The workshop will bring together people who are actively involved in

Semitic language processing in a mono- or cross/multilingual context,

and give them an opportunity to update the community through reports on

completed or ongoing work as well as on the availability of LRs,

evaluation protocols and campaigns, products and core technologies (in

particular open source ones). We also invite authors to address other

languages spoken in the Semitic language area (languages such as

Amazigh, Coptic, etc.).  This should enable participants to develop a

common view on where we stand and to foster the discussion of the

future of this research area.  Particular attention will be paid to

activities involving technologies such as Machine Translation and

Cross-Lingual Information Retrieval/Extraction, Summarization, etc.

Evaluation methodologies and resources for evaluation of HLT will be

also a main focus.  <br>

  <br>

We expect to elaborate on the HLT state of the art, identify problems

of common interest, and debate on a potential roadmap for the Semitic

languages. Issues related to sharing of resources, tools, standards,

sharing and dissemination of information and expertise, adoption of

current best practices, setting up joint projects and technology

transfer mechanisms will be an important part of the workshop.<br>

<br>

Topics of Interest<br>

<br>

This full-day workshop is not intended to be a mini-conference, but as

a real workshop aiming at concrete results that should clarify the

situation of Semitic languages with respect to Language Resources and

Evaluation. We expect to launch at least two evaluation campaigns:

Comparative evaluation of Morphology taggers and Named Entities

Recognizers. <br>

<br>

Among the many issues to be addressed, below follow a few suggestions:<br>

<br>

    Issues in the design, the acquisition, creation, management,

access, distribution, use of Language Resources, in particular in a

bilingual/multilingual setting (Standard Arabic, Hebrew, Colloquial

Arabic, Amazigh, Coptic, Maltese, etc.)<br>

<br>

    Impact on LR collections/processing and NLP of the crucial issues

related to "code switching" between different dialects and languages<br>

<br>

    Specific issues related to the above-mentioned languages such as

the role of morphology, named entities, corpus alignment, etc.<br>

<br>

    Multilinguality issues including relationship between Colloquial

and Standard Arabic<br>

<br>

    Exploitation of LR in different types of applications<br>

<br>

    Industrial LR requirements and community's response<br>

<br>

    Benchmarking of systems and products; resources for benchmarking

and evaluation for written and spoken language processing;<br>

<br>

    Focus on some key technologies such as MT (all approaches e.g.

Statistical, Example-Based, etc.), Information Retrieval, Speech

Recognition, Spoken Documents Retrieval, CLIR, Question-Answering,

Summarization, etc.<br>

<br>

    Local, regional, and international activities and projects and

needs, possibilities, forms, initiatives of/for regional and

international cooperation.<br>

<br>

We invite submissions on computational approaches to processing

text/speech in all Semitic and Semitic-area languages. The call is open

for all kinds of computational work, e.g., work on computational

linguistic processing components (e.g., analyzers, taggers, parsers),

on state-of-the-art NLP applications and systems, on leveraging

resource and tool creation for the Semitic language family, and on

using computational tools to gain new linguistic insight. We especially

welcome submissions on work that crosses individual language

boundaries, heightens awareness amongst Semitic-language researchers of

shared challenges and breakthroughs, and highlights issues and

solutions common to any subset of the Semitic languages family.<br>

<br>

<br>

Workshop general chair:   <br>

Khalid Choukri, <a class="moz-txt-link-abbreviated" href="mailto:Choukri@elda.org">Choukri@elda.org</a>, ELRA/ELDA, Paris, France<br>

<br>

Workshop co-chairs:   <br>

Owen Rambow, Columbia University, New York, USA  <br>

Bente Maegaard , University of Copenhagen, Denmark <br>

Ibrahim A. Al-Kharashi, Computer and Electronics Research Institute,

King Abdulaziz City for Science and Technology, Saudi Arabia<br>

<br>

<br>

Organizing Committee information <br>

The Organizing, Program, and the Scientific Committees will be listed

on the web pages.<br>

<br>

Important Dates<br>

<br>

Deadline for abstract submissions:    26 February 2010<br>

Notification of acceptance:        15 March 2010<br>

Final version of accepted paper:    11 April 2010<br>

Workshop full-day:            17 May 2010<br>

<br>

Submission Details<br>

<br>

Submissions should comply with LREC standards (including the LREC Map

initiative) and must be in English. Abstracts for workshop

contributions should not exceed Four A4 pages (excluding references).

An additional title page should state: the title; author(s);

affiliation(s); and contact author's e-mail address, as well as postal

address, telephone and fax numbers.<br>

<br>

Submission will use the LREC START facility: <a

 href="https://www.softconf.com/lrec2010/SemiticLanguages2010/">https://www.softconf.com/lrec2010/SemiticLanguages2010/</a><br>

Expected deadline is 26 February 2010.<br>

<br>

Submitted papers will be judged based on relevance to the workshop

aims, as well as the novelty of the idea, technical quality, clarity of

presentation, and expected impact on future research within the area of

focus.<br>

<br>

Registration to LREC’2010 will be required for participation, so

potential participants are invited to refer to the main conference

website for all details not covered in the present call (<a

 href="http://www.lrec-conf.org/lrec2010/">http://www.lrec-conf.org/lrec2010/</a>)<br>

<br>

Formatting instructions for the final full version of papers will be

sent to authors after notification of acceptance and will be identical

to LREC main conference instructions.<br>

<br>

<i>When submitting a paper through the START page, authors will be

kindly asked to provide relevant information about the resources that

have been used for the work described in their paper or that are the

outcome of their research. For further information on this new

initiative, please refer to

<a class="moz-txt-link-freetext" href="http://www.lrec-conf.org/lrec2010/?LREC2010-Map-of-Language-Resources">http://www.lrec-conf.org/lrec2010/?LREC2010-Map-of-Language-Resources</a>.<br>

</i><br>

<span style="font-size: 12pt; font-family: "Times New Roman";"

 lang="EN-US"></span><i><span style="font-size: 12pt;" lang="EN-GB"></span></i><span

 style="font-size: 12pt; font-family: "Times New Roman";" lang="EN-US"><o:p></o:p></span></p>

</body>

</html>