[Corpora-List] Looking for Corpora in: English, Swedish, Polish, Italian, Finnish, Estonian, Hungarian

Marina Santini marinamailinglists at gmail.com
Sun May 11 08:23:43 UTC 2014


Hi Kristian,

I have now updated the blog post with the latest suggestions:
http://www.forum.santini.se/2014/04/corporasummary/

Thanks

Best Regards

Marina Santini
http://www.forum.santini.se
https://www.linkedin.com/in/marinasantini
http://www.linkedin.com/groups/WebGenre-R-D-Group-4301498
https://www.facebook.com/genresontheweb

On 6 May 2014 10:00, Kristian Kankainen <kristian at eki.ee> wrote:
> Dear all,
>
> There is also a quite comprehensive list of all sorts of resources for
> Estonian (wordlists, biographical data collections, dialect data,
> phonetical resources, spoken language, internet language, learner
> language corpora, etc) here:
>
> http://viki.keeleleek.ee/wiki/Eesti_keele_ressursside_loend
>
> All descriptions are in estonian only now and it's quite stupidly
> organized as a textual list or collection of links. It's open for
> everyone to edit.
>
> All the best
> Kristian Kankainen
>
> Ühel kenal päeval, P, 23.03.2014 kell 08:11, kirjutas anne tamm:
>> Dear Marina,
>>
>> The following pages lead to further corpora in Hungarian and Estonian.
>>
>> Hungarian: http://www.nytud.hu/dbases/index.html
>> Estonian: http://www.keeletehnoloogia.ee/projektid/koondkorpus
>>
>> Best,
>> Anne Tamm
>>
>>
>>
>>
>> On Sunday, March 23, 2014 4:01 PM, Marina Santini
>> <marinamailinglists at gmail.com> wrote:
>>
>> Hi,
>>
>> I am looking for corpora of any genre in the following languages:
>> English, Swedish, Polish, Italian, Finnish, Estonian, and Hungarian.
>> I am already aware of a number of corpora (several posts in the
>> WebGenre blog are dedicated to the dissemination of corpora-related
>> information). These corpora, though, are mostly in English. I would
>> like now to focus on: 1) additional languages and 2) additional
>> genres, such as search query logs, tv scripts, emails, tweets, whatsup
>> messages, etc.
>> All genres are well accepted! The only requirement is: corpora must be
>> free and publicly available. Everybody must be able to replicate or
>> extend experiments using the same corpora/datasets.
>>
>> The purpose of the experiments is to explore cross-linguality in
>> different settings. Please, read the use cases in the blog post to
>> have an idea of the type of communicative situations under
>> investigation
>> (http://www.forum.santini.se/2014/03/looking-for-corpora-to-explore-cross-linguality/)
>>
>> Thanx in advance for your suggestions and pointers.
>> --
>>
>> Marina Santini
>> http://www.forum.santini.se
>> http://www.linkedin.com/groups/WebGenre-R-D-Group-4301498
>>
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>
>



-- 
Marina Santini
http://sites.google.com/site/marinasantiniacademicsite/

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list