30.1219, FYI: March 2019 Newsletter - LDC

The LINGUIST List linguist at listserv.linguistlist.org
Sat Mar 16 02:42:09 UTC 2019


LINGUIST List: Vol-30-1219. Fri Mar 15 2019. ISSN: 1069 - 4875.

Subject: 30.1219, FYI: March 2019 Newsletter - LDC

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Peace Han, Nils Hjortnaes, Yiwen Zhang, Julian Dietrich
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: Fri, 15 Mar 2019 22:41:23
From: Membership Office [LDC at LDC.UPENN.EDU]
Subject: March 2019 Newsletter - LDC

 
In this newsletter: 

Call for Papers – LTC 2019, LREC 2020

New Publications:

CALLFRIEND Egyptian Arabic Second Edition 
Penn Discourse Treebank Version 3.0
VAST Chinese Speech and Transcripts

Call for Papers:

The 9th Language & Technology Conference (LTC 2019) will take place on May
17-19, 2019, at the Adam Mickiewicz University in Poznań, Poland. LTC
addresses Human Language Technologies as a challenge for computer science,
linguistics, and related fields. Conference papers are due next week on
Wednesday, March 20, 2019 (midnight, any time zone). For more information,
visit the conference webpage. 

The 12th Conference on Language Resources and Evaluation (LREC 2020) will take
place on May 13-15, 2020, at the Palais du Pharo in Marseille, France. LREC
aims to provide an overview of the state-of-the-art, explore new R&D
directions and emerging trends, and exchange information regarding language
resources and their applications, evaluation methodologies, and tools.
Conference papers are due by November 25, 2019. For more information,
including conference topics, visit the conference webpage.

New Publications:

(1) CALLFRIEND Egyptian Arabic Second Edition was developed by LDC and
consists of approximately 25 hours of unscripted telephone conversations
between native speakers of Egyptian Arabic. This second edition updates the
audio files to wav format, simplifies the directory structure, and adds
documentation and metadata. The first edition is available as CALLFRIEND
Egyptian Arabic (LDC96S49).
All data were collected before July 1997. Participants could speak with a
person of their choice on any topic; most called family members and friends.
All calls originated in North America. The recorded conversations last up to
30 minutes. 

CALLFRIEND Egyptian Arabic Second Edition is distributed via web download. 

2019 Subscription Members will automatically receive copies of this corpus.
2019 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.  


(2) Penn Discourse Treebank Version 3.0 is the third release in the Penn
Discourse Treebank project, the goal of which is to annotate the Wall Street
Journal (WSJ) section of Treebank-2 (LDC95T7) with discourse relations. Penn
Discourse Treebank Version 2 (LDC2008T05) contains over 40,600 tokens of
annotated relations. In Version 3, an additional 13,000 tokens were annotated,
certain pairwise annotations were standardized, new senses were included, and
the corpus was subject to a series of consistency checks.
This corpus contains two tools: (1) The Annotator, used for annotation and
adjudication, and which can also be used for viewing the corpus; and (2) The
Conversion Tool for converting Version 2 annotation files into the Version 3
format.

The documentation directory contains a manual describing what is new in
Version 3 and how Version 3 differs from Version 2; the methods and guidelines
used in annotating PDTB Version 3; and a range of statistics on the tokens,
including the frequency of each connective, its sense labels, and its
modifiers. More information about the corpus and research carried out by the
developers and others using the corpus can be found on the PDTB website.

Penn Discourse Treebank Version 3.0 is distributed via web download. 

2019 Subscription Members will automatically receive copies of this corpus.
2019 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.  


(3) VAST Chinese Speech and Transcripts was developed by LDC for the VAST
(Video Annotation for Speech Technologies) project and is comprised of
approximately 29 hours of Mandarin Chinese audio extracted from amateur video
content harvested from the web and corresponding time-aligned transcripts. 

Audio files were transcribed using XTrans, which supports manual transcription
across multiple channels, languages, and platforms. Transcribers followed a
Quick-Rich Transcription style; transcription guidelines are included in this
release. 

The aim of the VAST project was to collect and annotate data in several
languages to support the development of speech technologies such as speech
activity detection, language identification, speaker identification, and
speech recognition. 

VAST Chinese Speech and Transcripts is distributed via web download.

2019 Subscription Members will automatically receive copies of this corpus.
2019 Standard Members may request a copy as part of their 16 free membership
corpora. Non-members may license this data for a fee.  

Membership Office
Linguistic Data Consortium
University of Pennsylvania
T: +1-215-573-1275
E: ldc at ldc.upenn.edu
M: 3600 Market St. Suite 810
      Philadelphia, PA 19104
 



Linguistic Field(s): Computational Linguistics





 



------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
               https://iufoundation.fundly.com/the-linguist-list-2019

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-30-1219	
----------------------------------------------------------






More information about the LINGUIST mailing list