[Corpora-List] Spanish reference corpus

Emrah ozcanemrah at gmail.com
Fri Feb 2 09:36:01 UTC 2007


Thanks for the Spanish corpus. I have some other questions to ask you all...

   1. Do you know a similar corpus for Turkish (other than METU corpus
   http://www.ii.metu.edu.tr/~corpus/<http://www.ii.metu.edu.tr/%7Ecorpus/>).

   2. Also I would like to know if you could advise me a morphological
   parser for Turkish that I can find the root and suffix (and even prefix
   which are few like eş- /esh/ eşzamanlı --> synchronous).
   3. An automatic lemmatizer for Turkish...

Thanks in advance...
-- 
EMRAH ÖZCAN (M.A.)
Araş. Gör.

Yıldız Teknik Üniversitesi / Yildiz Technical University
Eğitim Fakültesi / Faculty of Education
Yabancı Diller Eğitimi Böl. / Foreign Languages Teaching Dept.
Davutpaşa Yerleşkesi / Campus at Davutpasa
Esenler, İstanbul
Türkiye

posta: eozcan {@} yildiz.edu.tr
telefon: +90 212 449 1616
ağ sayfası: http://www.dil.yildiz.edu.tr/emrah


On 2/2/07, Serge Sharoff <s.sharoff at leeds.ac.uk> wrote:
>
> yes, the frequency list is also available:
> http://corpus.leeds.ac.uk/frqc/internet-es-forms.num (for word forms)
> http://corpus.leeds.ac.uk/frqc/internet-es.num (for lemmas, though you'd
> better take the results of automatic lemmatisation with caution).
>
> BTW, the frequencies (the second column) are in terms of ipm (instances
> per million words).
>
> Serge
>
> On Thu, 2007-02-01 at 14:17 +0100, Mario Crespo Miguel wrote:
> > Thank you very much for helping me, but I think it is more
> > convenient for me if the frequencies of the words of this open
> > domain / general corpus could be obtained. Does anybody know if
> > such an information is available some way? Best,
> >
> > Mario
> >
> >
> >
> > El dia 30 ene 2007 16:10, Serge Sharoff <s.sharoff at leeds.ac.uk>
> > escribió:
> >
> > > one answer is the Spanish Internet corpus with the interface from
> > > http://corpus.leeds.ac.uk/internet.html
> > > and the URL list
> > > http://corpus.leeds.ac.uk/internet/final-url-es.gz
> > >
> > > This is a random snapshot of the Spanish Internet of about 120
> > > million
> > > words, see
> > > Sharoff, S (2006) Creating general-purpose corpora using
> > > automated
> > > search engine queries. In Marco Baroni and Silvia Bernardini,
> > > editors,
> > > WaCky! Working papers on the Web as Corpus. Gedit, Bologna.
> > > http://wackybook.sslmit.unibo.it/
> > >
> > > S
> > >
> > > On Tue, 2007-01-30 at 15:54 +0100, Mario Crespo Miguel wrote:
> > >> Dear everybody,
> > >>
> > >> Thank you again for all the help that I always get with this
> > >> mailing list, and  this time I would like to ask if there is
> > >> some reference / open-domain corpus for Spanish which is freely
> > >> available and could be downloaded. Thank you in advance. Best
> > >> wishes,
> > >>
> > >> Mario Crespo Miguel
> > >>
> > >>
> > >
> > >
> >
> >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20070202/2ba5a933/attachment.htm>


More information about the Corpora mailing list