<div dir="ltr"><div><div><div><div><div><div><div><div><div>Dear Marina,<br><br></div>you can get an access to quite a decent corpora of Finnish from The Language Bank of Finalnd. For that, however, you would need to register (which is pretty simple), link here:<br>
<a href="http://www.csc.fi/english/research/sciences/linguistics/index_html">http://www.csc.fi/english/research/sciences/linguistics/index_html</a><br></div>Other options are:<br>-Corpus of Institute for the languages of Finland, which contains also some older texts<br>
<a href="http://kaino.kotus.fi/korpus/meta/korpus_coll_rdf.xml">http://kaino.kotus.fi/korpus/meta/korpus_coll_rdf.xml</a><br></div>- project Gutenberg.<br><br></div>In case of Polish, there is the National Corpus of Polish:<br>
<a href="http://nkjp.pl/index.php?page=11&lang=1">http://nkjp.pl/index.php?page=11&lang=1</a><br><br></div>Some other ideas for finding texts you might get checking OPUS<br><a href="http://opus.lingfil.uu.se/">http://opus.lingfil.uu.se/</a><br>
<br></div>Interkorp:<br><a href="http://ucnk.ff.cuni.cz/intercorp/">http://ucnk.ff.cuni.cz/intercorp/</a><br></div>and <br><br></div>ParaSol:<br><a href="http://parasol.unibe.ch/">http://parasol.unibe.ch/</a><br><br></div>
which are quite massive multi-lingual corpora.<br><br><br><div><div><div><div><div>All the best,<br></div><div>Edyta Jurkiewicz-Rohrbacher<br></div><div><div><div><br><br><br></div></div></div></div></div></div></div></div>
<div class="gmail_extra"><br><br><div class="gmail_quote">2014-03-23 18:12 GMT+01:00 Ralf Steinberger <span dir="ltr"><<a href="mailto:ralf.steinberger@jrc.ec.europa.eu" target="_blank">ralf.steinberger@jrc.ec.europa.eu</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div link="blue" vlink="purple" lang="EN-GB"><div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Dear Marina,<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">At the JRC’s Language Technology page <a href="http://ipsc.jrc.ec.europa.eu/index.php?id=61" target="_blank">http://ipsc.jrc.ec.europa.eu/index.php?id=61</a>, you find parallel corpora for all the languages you are searching for, and more.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">All the best,<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Ralf<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><b><span style="font-size:9.0pt;font-family:"Calibri","sans-serif";color:#4a442a" lang="DE">Ralf Steinberger</span></b><span style="font-size:9.0pt;font-family:"Calibri","sans-serif";color:#4a442a" lang="DE"> <u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:"Calibri","sans-serif";color:#4a442a" lang="EN-US">European Commission – Joint Research Centre (JRC)<u></u><u></u></span></p><p class="MsoNormal">
<span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US"> <a href="mailto:corpora-bounces@uib.no" target="_blank">corpora-bounces@uib.no</a> [mailto:<a href="mailto:corpora-bounces@uib.no" target="_blank">corpora-bounces@uib.no</a>] <b>On Behalf Of </b>Marina Santini<br>
<b>Sent:</b> 23 March 2014 15:26<br><b>To:</b> <a href="mailto:corpora@uib.no" target="_blank">corpora@uib.no</a>; Marina Santini<br><b>Subject:</b> [Corpora-List] Looking for Corpora in: English, Swedish, Polish, Italian, Finnish, Estonian, Hungarian<u></u><u></u></span></p>
<div><div class="h5"><p class="MsoNormal"><u></u> <u></u></p><div><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Helvetica","sans-serif";color:#333333">Hi, </span><u></u><u></u></p><div>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Helvetica","sans-serif";color:#333333"><br>I am looking for corpora of any genre in the following languages: English, Swedish, Polish, Italian, Finnish, Estonian, and Hungarian. <br>
I am already aware of a number of corpora (several posts in the WebGenre blog are dedicated to the dissemination of corpora-related information). These corpora, though, are mostly in English. I would like now to focus on: 1) additional languages and 2) additional genres, such as search query logs, tv scripts, emails, tweets, whatsup messages, etc. <br>
All genres are well accepted! The only requirement is: corpora must be free and publicly available. Everybody must be able to replicate or extend experiments using the same corpora/datasets. <br><br>The purpose of the experiments is to explore cross-linguality in different settings. Please, read the use cases in the blog post to have an idea of the type of communicative situations under investigation (</span><a href="http://www.forum.santini.se/2014/03/looking-for-corpora-to-explore-cross-linguality/" target="_blank">http://www.forum.santini.se/2014/03/looking-for-corpora-to-explore-cross-linguality/</a><span style="font-size:10.0pt;font-family:"Helvetica","sans-serif";color:#333333">)</span><u></u><u></u></p>
</div><div><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Helvetica","sans-serif";color:#333333"><br>Thanx in advance for your suggestions and pointers. </span><u></u><u></u></p><div><div>
<p class="MsoNormal">-- <u></u><u></u></p></div><p class="MsoNormal">Marina Santini<u></u><u></u></p></div><div><p class="MsoNormal"><a href="http://www.forum.santini.se" target="_blank">http://www.forum.santini.se</a> <br>
<a href="http://www.linkedin.com/groups/WebGenre-R-D-Group-4301498" target="_blank"><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">http://www.linkedin.com/groups/WebGenre-R-D-Group-4301498</span></a><u></u><u></u></p>
</div></div></div></div></div></div></div><br>_______________________________________________<br>
UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
<br></blockquote></div><br></div>