[Corpora-List] Fwd: Re: Spanish corpus

"Christiane Hümmer" C.Huemmer at gmx.net
Thu Nov 1 10:54:33 UTC 2007


-------- Original-Nachricht --------
Datum: Thu, 01 Nov 2007 11:51:55 +0100
Von: "Christiane Hümmer" <C.Huemmer at gmx.net>
An: "Steven Bird" <sb at csse.unimelb.edu.au>
Betreff: Re: [Corpora-List] Spanish corpus

Hello,

try 
http://www.bds.usc.es/

best regards,
Christiane

-------- Original-Nachricht --------
> Datum: Thu, 1 Nov 2007 07:48:54 +1100
> Von: "Steven Bird" <sb at csse.unimelb.edu.au>
> An: "Mario Crespo Miguel" <mario.crespo at uca.es>
> CC: CORPORA at uib.no
> Betreff: Re: [Corpora-List] Spanish corpus

> On 11/1/07, Mario Crespo Miguel <mario.crespo at uca.es> wrote:
> > Dear all,
> >
> > I wonder if anyone on the list knows if there is available a
> > syntactically tagged corpus of Spanish and it could be used for
> > research purposes. Thank you very much in advance,
> 
> NLTK includes the CESS-ESP Treebank, with 6030 parsed sentences,
> distributed with permission of Dr Toni Martí at the University of
> Barcelona.
> 
> For details, please see:
> http://nltk.svn.sourceforge.net/viewvc/*checkout*/nltk/trunk/nltk/data/corpora/cess_esp/README
> 
> NLTK includes a corpus reader with methods for iterating over the
> words, tagged words, sentences, and parsed sentences of the corpus,
> e.g.:
> 
> >>> import nltk
> >>> nltk.corpus.cess_esp.words()
> ['El', 'grupo', 'estatal', 'Electricit\xe9_de_France', ...]
> 
> >>> nltk.corpus.cess_esp.sents()
> [['El', 'grupo', 'estatal', 'Electricit\xe9_de_France', '-Fpa-',
> 'EDF', '-Fpt-', 'anunci\xf3', 'hoy', ',', 'jueves', ',', 'la',
> 'compra', 'del', '51_por_ciento', 'de', 'la', 'empresa', 'mexicana',
> 'Electricidad_\xc1guila_de_Altamira', '-Fpa-', 'EAA', '-Fpt-', ',',
> 'creada', 'por', 'el', 'japon\xe9s', 'Mitsubishi_Corporation', 'para',
> 'poner_en_marcha', 'una', 'central', 'de', 'gas', 'de', '495',
> 'megavatios', '.'], ['Una', 'portavoz', 'de', 'EDF', 'explic\xf3',
> 'a', 'EFE', 'que', 'el', 'proyecto', 'para', 'la', 'construcci\xf3n',
> 'de', 'Altamira_2', ',', 'al', 'norte', 'de', 'Tampico', ',',
> 'prev\xe9', 'la', 'utilizaci\xf3n', 'de', 'gas', 'natural', 'como',
> 'combustible', 'principal', 'en', 'una', 'central', 'de', 'ciclo',
> 'combinado', 'que', 'debe', 'empezar', 'a', 'funcionar', 'en',
> 'mayo_del_2002', '.'], ...]
> 
> >>> print nltk.corpus.cess_esp.parsed_sents()[0]
> (S
>   (sn-SUJ
>     (espec.ms (da0ms0 El))
>     (grup.nom.ms
>       (ncms000 grupo)
>       (s.a.ms (grup.a.ms (aq0cs0 estatal)))
>       (sn
>         (grup.nom.ms
>           (np00000 Electricit?_de_France)
>           (sn (grup.nom.ms (Fpa -Fpa-) (np00000 EDF) (Fpt -Fpt-)))))))
>   (grup.verb (vmis3s0 anunci?))
>   (sadv-CCT
>     (grup.adv (rg hoy) (sn (Fc ,) (grup.nom.ms (W jueves)) (Fc ,))))
>   (sn-CD
>     (espec.fs (da0fs0 la))
>     (grup.nom.fs
>       (ncfs000 compra)
>       (sp
>         (prep (spcms del))
>         (sn
>           (grup.nom.ms
>             (Zp 51_por_ciento)
>             (sp
>               (prep (sps00 de))
>               (sn
>                 (espec.fs (da0fs0 la))
>                 (grup.nom.fs
>                   (ncfs000 empresa)
>                   (s.a.fs (grup.a.fs (aq0fs0 mexicana)))
>                   (sn
>                     (grup.nom.fs
>                       (np00000 Electricidad_?guila_de_Altamira)
>                       (sn
>                         (grup.nom.fs
>                           (Fpa -Fpa-)
>                           (np00000 EAA)
>                           (Fpt -Fpt-)))))
>                   (S.NF.P
>                     (Fc ,)
>                     (participi (aq0fsp creada))
>                     (sp-CAG
>                       (prep (sps00 por))
>                       (sn
>                         (espec.ms (da0ms0 el))
>                         (grup.nom.ms
>                           (s.a.ms (grup.a.ms (aq0ms0 japon?s)))
>                           (np00000 Mitsubishi_Corporation))))
>                     (sp-CC
>                       (prep (sps00 para))
>                       (S.NF.C
>                         (infinitiu (vmn0000 poner_en_marcha))
>                         (sn-CD
>                           (espec.fs (di0fs0 una))
>                           (grup.nom.fs
>                             (ncfs000 central)
>                             (sp
>                               (prep (sps00 de))
>                               (sn
>                                 (grup.nom.ms
>                                   (ncms000 gas)
>                                   (sp
>                                     (prep (sps00 de))
>                                     (sn
>                                       (espec.mp (Z 495))
>                                       (grup.nom.mp
>                                         (ncmp000
> megavatios))))))))))))))))))))
>   (Fp .))
> 
> 
> To download NLTK, please visit http://nltk.org/index.php
> 
> Steven Bird
> http://www.csse.unimelb.edu.au/~sb/
> 
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-- 
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! 
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list