[Corpora-List] Seeking bilingual corpora, colloquial register, gaming

Alex Juan alhelsal at posgrado.upv.es
Tue Aug 7 12:51:12 UTC 2012


Thank you,

I'll take a look at that.

2012/8/6 Joerg Tiedemann <jorg.tiedemann at lingfil.uu.se>

> Maybe translated movie subtitles would fit your needs:
> http://opus.lingfil.uu.se/OpenSubtitles_v2.php
> There is plenty of dialogues, swear words, abbreviations and even
> spelling mistakes (but mostly coming from OCR) in the data collection.
>
> Jörg
>
>
> On Mon, Aug 6, 2012 at 11:58 AM, Alex Juan <alhelsal at posgrado.upv.es>
> wrote:
> > Dear all,
> >
> > I am looking for bilingual/multilingual corpora that could be classified
> as
> > UGC, that is, user-generated content. This ranges from (but may not be
> > limited to) chat conversations, support forum conversations,
> phone/sms/email
> > transcripts, etc.
> >
> > As you know, the language here is not always "standard", and this content
> > may be rich not only in abbreviations but also contain spelling mistakes,
> > and even figurative language and swearwords. If there are also
> collections
> > or repositories of keywords (aka "seed" words) used in similar studies,
> that
> > would also be of help. In the first instance, the languages of interest
> are
> > German and English, with the items of the corpora or repositories aligned
> > with one another.
> >
> > I am attempting to build an MT prototype of DE<>EN for the gaming domain.
> >
> > Does anyone know of such a corpus? Any information/orientation will be
> > appreciated (even if it comes from specialists from other HLT fields,
> such
> > as sentiment analysis or semantic web).
> >
> > Thanks.
> >
> > --
> > Alex Juan
> >
> > _______________________________________________
> > UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> > Corpora mailing list
> > Corpora at uib.no
> > http://mailman.uib.no/listinfo/corpora
> >
>
>
>
> --
>
> **********************************************************************************
>  Jörg Tiedemann
> jorg.tiedemann at lingfil.uu.se
>  Dep. of Linguistics and Philology
> http://stp.lingfil.uu.se/~joerg/
>  Uppsala University                                  tel:  +46 (0)18 -
> 471 1412
>  Box 635, SE-751 26 Uppsala/SWEDEN    fax: +46 (0)18 - 471 1094
>



-- 
Alex Helle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120807/279eb0c7/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list