[Corpora-List] News from LDC
Linguistic Data Consortium
ldc at ldc.upenn.edu
Mon Nov 22 21:47:07 UTC 2010
*
**- **Early Renewal Discounts for Membership Year (MY) 2011 <#discounts>* -
**
/New publications:/
LDC2010T13
*- Arabic Treebank: Part 1 v 4.1 <#atb>* -
LDC2010T21*
- **NIST 2008 Open Machine Translation (OpenMT) Evaluation <#mt>* -
------------------------------------------------------------------------
**
*Early Renewal Discounts for Membership Year (MY) 2011*
**
**LDC values the significant contribution LDC members make through their
continued support of the consortium. We would like to invite new
members, as well as all current and previous members of LDC, to renew
for Membership Year (MY) 2011. For MY2011, LDC is pleased to maintain
membership fees at last year's rates -- membership fees will not
increase. Additionally, for the third straight year, LDC will extend
discounts on membership fees to members who keep their membership
current and who join early in the year.
The details of our Early Renewal Discounts for MY2011 are as follows:
* Organizations who joined for MY2010, will receive a 5% discount
when renewing. This discount will apply throughout 2011,
regardless of time of renewal. MY2010 members renewing before
March 1, 2011will receive an additional 5% discount, for a total
10% discount off the membership fee.
* New members as well as organizations who did not join for MY2010,
but who held membership in any of the previous MYs (1993-2009),
will also be eligible for a 5% discount provided that they
join/renew before March 1, 2011.
The following table provides exact pricing information.
*MY2011 Fee* *MY2011 Fee
with 5% Discount ** *MY2011 Fee
with 10% Discount ***
*Not-for-Profit*
Standard US$2400 US$2280 US$2160
Subscription US$3850 US$3657.50 US$3465
*For-Profit*
Standard US$24000 US$22800 US$21600
Subscription US$27500 US$26125 US$24750
* For new members, MY2010 Members renewing for MY2011, and any previous
year Member who renews before March 1, 2011
**For MY2010 Members renewing before March 1, 2011
Publications for MY2011 are still being planned and here are the working
titles of data sets we intend to provide:
Arabic Gigaword Fifth Edition English Gigaword Fifth Edition
Chinese Gigaword Fifth Edition Indian Language POS Tagset: Sanskrit
Digital Archive of Southern Speech OntoNotes 4.0
In addition to receiving new publications, current year members of the
LDC also enjoy the benefit of licensing older data at reduced costs;
current year for-profit members may use most data for commercial
applications.
This past year, the LDC members who joined early or kept their
membership current saved almost US$60,000 collectively on membership
fees. In fact, almost 90% of our members for MY2010 didn't pay full
price for membership! Be sure to keep an eye on your mail - all previous
and current LDC members have been sent an invitation to join letter and
renewal invoice for MY2011. Renew early for MY2011 to save today!
[top <#top>]
*New Publications*
(1) Arabic Treebank: Part 1 v 4.1
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2010T13>
was developed at LDC. It consists of 734 newswire stories from Agence
France Presse with part-of-speech , morphology, gloss and syntactic
treebank annotation in accordance with the Penn Arabic Treebank (PATB)
Guidelines <http://projects.ldc.upenn.edu/ArabicTreebank/>developed in
2008 and 2009. This release represents a significant revision of LDC's
previous ATB1 publications: Arabic Treebank: Part 1 v 2.0 (LDC2003T06
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T06>) and
Arabic Treebank: Part 1 v 3.0 (POS with full vocalization + syntactic
analysis) (LDC2005T02)
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T02>.
The ongoing PATB project supports research in Arabic-language natural
language processing and human language technology development. The
methodology and work leading to the release of this publication are
described in detail in the documentation accompanying this corpus and in
two research papers: Enhancing the Arabic Treebank: A Collaborative
Effort toward New Annotation Guidelines
<http://papers.ldc.upenn.edu/LREC2008/Enhancing_Arabic_Treebank.pdf> and
Consistent and Flexible Integration of Morphological Annotation in the
Arabic Treebank
<http://papers.ldc.upenn.edu/LREC2010/KulickBiesMaamouri-LREC2010.pdf>.
ATB1 v 4.1 contains a total of 145,386 tokens before clitics are split,
and 167,280 tokens after clitics are separated for the treebank annotation.
[top <#top>]
*
(2) NIST 2008 Open Machine Translation (OpenMT) Evaluation
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2010T21>
is a package containing source data, reference translations and scoring
software used in the NIST 2008 OpenMT evaluation. It is designed to help
evaluate the effectiveness of machine translation systems. The package
was compiled and scoring software was developed by researchers at NIST,
making use of broadcast, newswire and web data and reference
translations collected and developed by LDC.
The 2008 task was to evaluate translation from Arabic to English,
Chinese to English, English to Chinese (newswire only) and Urdu to
English. Selected human reference translations and system translations
for the NIST MT08 test sets are contained in NIST Open Machine
Translation 2008 Evaluation (MT08) Selected Reference and System
Translations LDC2010T01
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2010T01>.
This release contains of 494 documents with corresponding sets of four
separate human expert reference translations. The source data is
comprised of Arabic, Chinese, English and Urdu news wire, broadcast and
weblog and newsgroup data collected by LDC in 2007. The news wire and
broadcast material are from Asharq Al-Awsat (Arabic), Agence
France-Presse (Arabic, Chinese, English), Al-Ahram (Arabic), Al Hayat
(Arabic), Assabah (Arabic), An Nahar (Arabic), Al-Quds Al-Arabi
(Arabic), Xinhua News Agency (Arabic, Chinese, English), Central News
Service (Chinese), Guangming Daily (Chinese), People's Daily (Chinese),
People's Liberation Army Daily (Chinese), British Broadcasting
Corporation (Urdu), Daily Jang (Urdu), Pakistan News Service (Urdu),
Voice of America (Urdu), Associated Press (English), New York Times
(English) and Los Angeles Times/Washington Post Newswire Service (English).
This evaluation kit includes a single Perl script (mteval-v11b.pl) that
may be used to produce a translation quality score for one (or more) MT
systems.
Additional information about these evaluations may be found at the NIST
Open Machine Translation (OpenMT) Evaluation web site
<http://www.itl.nist.gov/iad/mig/tests/mt/>.
[top <#top>]
------------------------------------------------------------------------
Ilya Ahtaridis
Membership Coordinator
--------------------------------------------------------------------
Linguistic Data Consortium Phone: 1 (215) 573-1275
University of Pennsylvania Fax: 1 (215) 573-2175
3600 Market St., Suite 810ldc at ldc.upenn.edu
Philadelphia, PA 19104 USAhttp://www.ldc.upenn.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20101122/18dfa696/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list