[Corpora-List] Looking for Corpora in: English, Swedish, Polish, Italian, Finnish, Estonian, Hungarian

Edyta Jurkiewicz-Rohrbacher edytaj at gmail.com
Sun Mar 23 17:50:04 UTC 2014


Dear Marina,

you can get an access to quite a decent corpora of Finnish from The
Language Bank of Finalnd. For that, however, you would need to register
(which is pretty simple), link here:
http://www.csc.fi/english/research/sciences/linguistics/index_html
Other options are:
-Corpus of Institute for the languages of Finland, which contains also some
older texts
 http://kaino.kotus.fi/korpus/meta/korpus_coll_rdf.xml
- project Gutenberg.

In case of Polish, there is the National Corpus of Polish:
http://nkjp.pl/index.php?page=11&lang=1

Some other ideas for finding texts you might get checking OPUS
http://opus.lingfil.uu.se/

Interkorp:
http://ucnk.ff.cuni.cz/intercorp/
and

ParaSol:
http://parasol.unibe.ch/

which are quite massive multi-lingual corpora.


All the best,
Edyta Jurkiewicz-Rohrbacher





2014-03-23 18:12 GMT+01:00 Ralf Steinberger <
ralf.steinberger at jrc.ec.europa.eu>:

> Dear Marina,
>
>
>
> At the JRC's Language Technology page
> http://ipsc.jrc.ec.europa.eu/index.php?id=61, you find parallel corpora
> for all the languages you are searching for, and more.
>
>
>
> All the best,
>
>
>
> Ralf
>
>
>
> *Ralf Steinberger*
>
> European Commission - Joint Research Centre (JRC)
>
>
>
> *From:* corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] *On Behalf
> Of *Marina Santini
> *Sent:* 23 March 2014 15:26
> *To:* corpora at uib.no; Marina Santini
> *Subject:* [Corpora-List] Looking for Corpora in: English, Swedish,
> Polish, Italian, Finnish, Estonian, Hungarian
>
>
>
> Hi,
>
>
> I am looking for corpora of any genre in the following languages: English,
> Swedish, Polish, Italian, Finnish, Estonian, and Hungarian.
> I am already aware of a number of corpora (several posts in the WebGenre
> blog are dedicated to the dissemination of corpora-related information).
> These corpora, though, are mostly in English. I would like now to focus on:
> 1) additional languages and 2) additional genres, such as search query
> logs, tv scripts, emails, tweets, whatsup messages, etc.
> All genres are well accepted! The only requirement is: corpora must be
> free and publicly available. Everybody must be able to replicate or extend
> experiments using the same corpora/datasets.
>
> The purpose of the experiments is to explore cross-linguality in different
> settings. Please, read the use cases in the blog post to have an idea of
> the type of communicative situations under investigation (
> http://www.forum.santini.se/2014/03/looking-for-corpora-to-explore-cross-linguality/
> )
>
>
> Thanx in advance for your suggestions and pointers.
>
> --
>
> Marina Santini
>
> http://www.forum.santini.se
> http://www.linkedin.com/groups/WebGenre-R-D-Group-4301498
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140323/7f2ad72d/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list