[Corpora-List] Search: Free annotated corpus for German

Hernani Marques (UZH) h2m at access.uzh.ch
Fri Jul 20 13:44:22 UTC 2012


On 20.07.2012 15:38, Yannick Versley wrote:

> Dear Hernani,

Hello Yannick

> are you talking about free-as-in-beer or about
> free-as-in-redistributable corpora?
> To my knowledge, there are many corpora that are available free of cost to the
> user, but none of them are redistributable.

The latter, free-as-in-redistributable in the sense of the Open Source
Initiative (OSI).

> These corpora are freely available for research purposes:
> * the TüBa-D/S corpus, which contains POS- and syntax-annotated versions
>   of Verbmobil dialogues http://www.sfs.uni-tuebingen.de/en/tuebads.shtml
> * The Tiger and Negra treebanks
> * The OPUS corpora at http://opus.lingfil.uu.se/ are quite large but
> only contain
>    automatic annotations.
> 
> You should be able to use TüBa-D/Z without additional cost if your university
> has a license - do check with your professors/colleagues.

Yeah, I know that's possible, but the idea is to be able to ship the
software with it, that's FOSS, too, w/o requiring any user to apply for
a license.

> To my knowledge, no one has yet made a redistributable hand-annotated
> corpus for German - unlike for English, where the MASC corpus
> (http://www.anc.org/MASC/Home.html)
> is freely downloadable and modifiable by anyone. One of the big
> problems in creating a corpus such as the MASC is to obtain
> permission from the rightsholders of the original texts.

OK, thanks for that hint, as for English I may need one, too.

-- 
hernani
Web: https://www.ccczh.ch/Hernani
identi.ca:  https://identi.ca/h2m
Jabber:   hernani at jabber.ccczh.ch
Diaspora*:   hernani at pod.ccczh.ch
* * * * *
I am a computer. I am dumber than any human and smarter than any
administrator.
* * * * *

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 194 bytes
Desc: OpenPGP digital signature
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120720/3228bb1d/attachment-0001.sig>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list