[Corpora-List] News from LDC

Mon Nov 22 21:47:07 UTC 2010

*

**- **Early Renewal Discounts for Membership Year (MY) 2011 <#discounts>*  -
**
/New publications:/

LDC2010T13
*- Arabic Treebank: Part 1 v 4.1 <#atb>*  -

LDC2010T21*
- **NIST 2008 Open Machine Translation (OpenMT) Evaluation <#mt>* -

------------------------------------------------------------------------
**
*Early Renewal Discounts for Membership Year (MY) 2011*

**
**LDC values the significant contribution LDC members make through their 
continued support of the consortium.  We would like to invite new 
members, as well as all current and previous members of LDC, to renew 
for Membership Year (MY) 2011.  For MY2011, LDC is pleased to maintain 
membership fees at last year's rates -- membership fees will not 
increase.  Additionally, for the third straight year, LDC will extend 
discounts on membership fees to members who keep their membership 
current and who join early in the year.

The details of our Early Renewal Discounts for MY2011 are as follows:

    * Organizations who joined for MY2010, will receive a 5% discount
      when renewing. This discount will apply throughout 2011,
      regardless of time of renewal. MY2010 members renewing before
      March 1, 2011will receive an additional 5% discount, for a total
      10% discount off the membership fee.
    * New members as well as organizations who did not join for MY2010,
      but who held membership in any of the previous MYs (1993-2009),
      will also be eligible for a 5% discount provided that they
      join/renew before March 1, 2011.

The following table provides exact pricing information.

	*MY2011 Fee* 	*MY2011 Fee
with 5% Discount ** 	*MY2011 Fee
with 10% Discount ***
*Not-for-Profit* 			
	Standard 	US$2400 	US$2280 	US$2160
	Subscription 	US$3850 	US$3657.50 	US$3465
*For-Profit* 			
	Standard 	US$24000 	US$22800 	US$21600
	Subscription 	US$27500 	US$26125 	US$24750

* For new members, MY2010 Members renewing for MY2011, and any previous 
year Member who renews before March 1, 2011

**For MY2010 Members renewing before March 1, 2011

Publications for MY2011 are still being planned and here are the working 
titles of data sets we intend to provide:

Arabic Gigaword Fifth Edition 	English Gigaword Fifth Edition
Chinese Gigaword Fifth Edition 	Indian Language POS Tagset: Sanskrit
Digital Archive of Southern Speech 	OntoNotes 4.0

In addition to receiving new publications, current year members of the 
LDC also enjoy the benefit of licensing older data at reduced costs; 
current year for-profit members may use most data for commercial 
applications.

This past year, the LDC members who joined early or kept their 
membership current saved almost US$60,000 collectively on membership 
fees.  In fact, almost 90% of our members for MY2010 didn't pay full 
price for membership! Be sure to keep an eye on your mail - all previous 
and current LDC members have been sent an invitation to join letter and 
renewal invoice for MY2011.  Renew early for MY2011 to save today!

[top <#top>]

*New Publications*

(1) Arabic Treebank: Part 1 v 4.1 
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2010T13> 
was developed at LDC. It consists of 734 newswire stories from Agence 
France Presse with part-of-speech , morphology, gloss and syntactic 
treebank annotation in accordance with the Penn Arabic Treebank (PATB) 
Guidelines <http://projects.ldc.upenn.edu/ArabicTreebank/>developed in 
2008 and 2009. This release represents a significant revision of LDC's 
previous ATB1 publications: Arabic Treebank: Part 1 v 2.0 (LDC2003T06 
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2003T06>) and 
Arabic Treebank: Part 1 v 3.0 (POS with full vocalization + syntactic 
analysis) (LDC2005T02) 
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T02>.

The ongoing PATB project supports research in Arabic-language natural 
language processing and human language technology development. The 
methodology and work leading to the release of this publication are 
described in detail in the documentation accompanying this corpus and in 
two research papers: Enhancing the Arabic Treebank: A Collaborative 
Effort toward New Annotation Guidelines 
<http://papers.ldc.upenn.edu/LREC2008/Enhancing_Arabic_Treebank.pdf> and 
Consistent and Flexible Integration of Morphological Annotation in the 
Arabic Treebank 
<http://papers.ldc.upenn.edu/LREC2010/KulickBiesMaamouri-LREC2010.pdf>.

ATB1 v 4.1 contains a total of 145,386 tokens before clitics are split, 
and 167,280 tokens after clitics are separated for the treebank annotation.

[top <#top>]

*

(2) NIST 2008 Open Machine Translation (OpenMT) Evaluation 
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2010T21> 
is a package containing source data, reference translations and scoring 
software used in the NIST 2008 OpenMT evaluation. It is designed to help 
evaluate the effectiveness of machine translation systems. The package 
was compiled and scoring software was developed by researchers at NIST, 
making use of broadcast, newswire and web data and reference 
translations collected and developed by LDC.

The 2008 task was to evaluate translation from Arabic to English, 
Chinese to English, English to Chinese (newswire only) and Urdu to 
English. Selected human reference translations and system translations 
for the NIST MT08 test sets are contained in NIST Open Machine 
Translation 2008 Evaluation (MT08) Selected Reference and System 
Translations LDC2010T01 
<http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2010T01>.

This release contains of 494 documents with corresponding sets of four 
separate human expert reference translations. The source data is 
comprised of Arabic, Chinese, English and Urdu news wire, broadcast and 
weblog and newsgroup data collected by LDC in 2007. The news wire and 
broadcast material are from Asharq Al-Awsat (Arabic), Agence 
France-Presse (Arabic, Chinese, English), Al-Ahram (Arabic), Al Hayat 
(Arabic), Assabah (Arabic), An Nahar (Arabic), Al-Quds Al-Arabi 
(Arabic), Xinhua News Agency (Arabic, Chinese, English), Central News 
Service (Chinese), Guangming Daily (Chinese), People's Daily (Chinese), 
People's Liberation Army Daily (Chinese), British Broadcasting 
Corporation (Urdu), Daily Jang (Urdu), Pakistan News Service (Urdu), 
Voice of America (Urdu), Associated Press (English), New York Times 
(English) and Los Angeles Times/Washington Post Newswire Service (English).

This evaluation kit includes a single Perl script (mteval-v11b.pl) that 
may be used to produce a translation quality score for one (or more) MT 
systems.

Additional information about these evaluations may be found at the NIST 
Open Machine Translation (OpenMT) Evaluation web site 
<http://www.itl.nist.gov/iad/mig/tests/mt/>.

[top <#top>]
------------------------------------------------------------------------

Ilya Ahtaridis
Membership Coordinator
--------------------------------------------------------------------
Linguistic Data Consortium                  Phone: 1 (215) 573-1275
University of Pennsylvania                    Fax: 1 (215) 573-2175
3600 Market St., Suite 810ldc at ldc.upenn.edu
Philadelphia, PA 19104 USAhttp://www.ldc.upenn.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20101122/18dfa696/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora