31.1757, Media: Enhanced Large Scale Colloquial Persian Language Understanding (LSCP) Corpus

The LINGUIST List linguist at listserv.linguistlist.org
Wed May 27 08:53:38 UTC 2020


LINGUIST List: Vol-31-1757. Wed May 27 2020. ISSN: 1069 - 4875.

Subject: 31.1757, Media: Enhanced Large Scale Colloquial Persian Language Understanding (LSCP) Corpus

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Lauren Perkins, Nils Hjortnaes, Yiwen Zhang, Joshua Sims
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: Wed, 27 May 2020 04:53:02
From: Hadi Abdi Khojasteh [hadiabdikhojasteh at gmail.com]
Subject: Enhanced Large Scale Colloquial Persian Language Understanding (LSCP) Corpus

 
I am thrilled to announce our new study on informal language understanding
which will be announced in LREC 2020.
This is the first public contribution of our effort for informal spoken
Persian (Farsi) language understanding and multilingual corpus for the
low-resourced aspect of spoken language. The language in its oral form is
typically much more dynamic than its written form. The written variety of a
language typically involves a higher level of ritual, whereas the spoken form
is characterised by several contractions and abbreviations. In formal written
texts, longer and tougher sentences tend to be used as the reader can re-read
the troublesome parts if they lose track.

More information can be found at https://iasbs.ac.ir/~ansari/lscp/ and the
corpus is available in the LINDAT/CLARIN-CZ repository via
http://hdl.handle.net/11234/1-3195. LSCP has approx. 120M sentences from 27M
casual Persian tweets with its dependency relations in syntactic annotation,
part-of-speech tags, sentiment polarity and translations in English, German,
Czech, Italian and Hindi spoken languages.
 


Linguistic Field(s): Computational Linguistics

Subject Language(s): Persian, Iranian (pes)

Language Family(ies): Iranian



------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
               https://iufoundation.fundly.com/the-linguist-list-2019

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-31-1757	
----------------------------------------------------------






More information about the LINGUIST mailing list