[Corpora-List] Fwd: Re: Spanish corpus
"Christiane Hümmer"
C.Huemmer at gmx.net
Thu Nov 1 10:54:33 UTC 2007
-------- Original-Nachricht --------
Datum: Thu, 01 Nov 2007 11:51:55 +0100
Von: "Christiane Hümmer" <C.Huemmer at gmx.net>
An: "Steven Bird" <sb at csse.unimelb.edu.au>
Betreff: Re: [Corpora-List] Spanish corpus
Hello,
try
http://www.bds.usc.es/
best regards,
Christiane
-------- Original-Nachricht --------
> Datum: Thu, 1 Nov 2007 07:48:54 +1100
> Von: "Steven Bird" <sb at csse.unimelb.edu.au>
> An: "Mario Crespo Miguel" <mario.crespo at uca.es>
> CC: CORPORA at uib.no
> Betreff: Re: [Corpora-List] Spanish corpus
> On 11/1/07, Mario Crespo Miguel <mario.crespo at uca.es> wrote:
> > Dear all,
> >
> > I wonder if anyone on the list knows if there is available a
> > syntactically tagged corpus of Spanish and it could be used for
> > research purposes. Thank you very much in advance,
>
> NLTK includes the CESS-ESP Treebank, with 6030 parsed sentences,
> distributed with permission of Dr Toni Martí at the University of
> Barcelona.
>
> For details, please see:
> http://nltk.svn.sourceforge.net/viewvc/*checkout*/nltk/trunk/nltk/data/corpora/cess_esp/README
>
> NLTK includes a corpus reader with methods for iterating over the
> words, tagged words, sentences, and parsed sentences of the corpus,
> e.g.:
>
> >>> import nltk
> >>> nltk.corpus.cess_esp.words()
> ['El', 'grupo', 'estatal', 'Electricit\xe9_de_France', ...]
>
> >>> nltk.corpus.cess_esp.sents()
> [['El', 'grupo', 'estatal', 'Electricit\xe9_de_France', '-Fpa-',
> 'EDF', '-Fpt-', 'anunci\xf3', 'hoy', ',', 'jueves', ',', 'la',
> 'compra', 'del', '51_por_ciento', 'de', 'la', 'empresa', 'mexicana',
> 'Electricidad_\xc1guila_de_Altamira', '-Fpa-', 'EAA', '-Fpt-', ',',
> 'creada', 'por', 'el', 'japon\xe9s', 'Mitsubishi_Corporation', 'para',
> 'poner_en_marcha', 'una', 'central', 'de', 'gas', 'de', '495',
> 'megavatios', '.'], ['Una', 'portavoz', 'de', 'EDF', 'explic\xf3',
> 'a', 'EFE', 'que', 'el', 'proyecto', 'para', 'la', 'construcci\xf3n',
> 'de', 'Altamira_2', ',', 'al', 'norte', 'de', 'Tampico', ',',
> 'prev\xe9', 'la', 'utilizaci\xf3n', 'de', 'gas', 'natural', 'como',
> 'combustible', 'principal', 'en', 'una', 'central', 'de', 'ciclo',
> 'combinado', 'que', 'debe', 'empezar', 'a', 'funcionar', 'en',
> 'mayo_del_2002', '.'], ...]
>
> >>> print nltk.corpus.cess_esp.parsed_sents()[0]
> (S
> (sn-SUJ
> (espec.ms (da0ms0 El))
> (grup.nom.ms
> (ncms000 grupo)
> (s.a.ms (grup.a.ms (aq0cs0 estatal)))
> (sn
> (grup.nom.ms
> (np00000 Electricit?_de_France)
> (sn (grup.nom.ms (Fpa -Fpa-) (np00000 EDF) (Fpt -Fpt-)))))))
> (grup.verb (vmis3s0 anunci?))
> (sadv-CCT
> (grup.adv (rg hoy) (sn (Fc ,) (grup.nom.ms (W jueves)) (Fc ,))))
> (sn-CD
> (espec.fs (da0fs0 la))
> (grup.nom.fs
> (ncfs000 compra)
> (sp
> (prep (spcms del))
> (sn
> (grup.nom.ms
> (Zp 51_por_ciento)
> (sp
> (prep (sps00 de))
> (sn
> (espec.fs (da0fs0 la))
> (grup.nom.fs
> (ncfs000 empresa)
> (s.a.fs (grup.a.fs (aq0fs0 mexicana)))
> (sn
> (grup.nom.fs
> (np00000 Electricidad_?guila_de_Altamira)
> (sn
> (grup.nom.fs
> (Fpa -Fpa-)
> (np00000 EAA)
> (Fpt -Fpt-)))))
> (S.NF.P
> (Fc ,)
> (participi (aq0fsp creada))
> (sp-CAG
> (prep (sps00 por))
> (sn
> (espec.ms (da0ms0 el))
> (grup.nom.ms
> (s.a.ms (grup.a.ms (aq0ms0 japon?s)))
> (np00000 Mitsubishi_Corporation))))
> (sp-CC
> (prep (sps00 para))
> (S.NF.C
> (infinitiu (vmn0000 poner_en_marcha))
> (sn-CD
> (espec.fs (di0fs0 una))
> (grup.nom.fs
> (ncfs000 central)
> (sp
> (prep (sps00 de))
> (sn
> (grup.nom.ms
> (ncms000 gas)
> (sp
> (prep (sps00 de))
> (sn
> (espec.mp (Z 495))
> (grup.nom.mp
> (ncmp000
> megavatios))))))))))))))))))))
> (Fp .))
>
>
> To download NLTK, please visit http://nltk.org/index.php
>
> Steven Bird
> http://www.csse.unimelb.edu.au/~sb/
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
--
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list