12.3165, Sum: Russian Corpora & Natural Language Processing
LINGUIST List
linguist at linguistlist.org
Sat Dec 22 16:24:49 UTC 2001
LINGUIST List: Vol-12-3165. Sat Dec 22 2001. ISSN: 1068-4875.
Subject: 12.3165, Sum: Russian Corpora & Natural Language Processing
Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>
Andrew Carnie, U. of Arizona <carnie at linguistlist.org>
Reviews (reviews at linguistlist.org):
Simin Karimi, U. of Arizona
Terence Langendoen, U. of Arizona
Editors (linguist at linguistlist.org):
Karen Milligan, WSU Naomi Ogasawara, EMU
Jody Huellmantel, WSU James Yuells, WSU
Michael Appleby, EMU Marie Klopfenstein, WSU
Ljuba Veselinova, Stockholm U. Heather Taylor-Loring, EMU
Dina Kapetangianni, EMU Richard Harvey, EMU
Karolina Owczarzak, EMU Renee Galvis, WSU
Software: John Remmers, E. Michigan U. <remmers at emunix.emich.edu>
Gayathri Sriram, E. Michigan U. <gayatri at linguistlist.org>
Home Page: http://linguistlist.org/
The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.
Editor for this issue: Marie Klopfenstein <marie at linguistlist.org>
=================================Directory=================================
1)
Date: Thu, 20 Dec 2001 09:38:59 +0800
From: Xu Hancheng <hanch-xu at jlonline.com.com>
Subject: NLP and Russian language
-------------------------------- Message 1 -------------------------------
Date: Thu, 20 Dec 2001 09:38:59 +0800
From: Xu Hancheng <hanch-xu at jlonline.com.com>
Subject: NLP and Russian language
Dear colleagues,
I posted a message to this list seeking for information about Russian
corpora and NLP in general(Linguist 12.2284).I really appreciate responses
from all of you. Unexpectedly, I have got abundant information. This is
my summary which consists of two parts: info about Russian corpora, papers,
tools and then contact information about respondents.
I. Russian corpora, papers,MLP tools and systems
1/ Upssala-Tuebingen corpora (Wayles Browne, Daniel Buncic, Dagmar
Divjak, Andrew Hippisley, Ruprecht von Waldenfels)
http://www.sfb441.uni-tuebingen.de/b1/en/korpora.html
and
http://www.slaviska.uu.se/korpus.htm
http://www.sfb441.uni-tuebingen.de/tusnelda.html
A most famous Russian corpora. 1 million words. A site you'd better to see.
2. http://purl.org/net/concordance (Serge Sharoff)
An aligned corpus and tools to work with multilingual corpora.
3. http://homepages.uni-tuebingen.de/elisabeth.seitz/pub/corpora.html (Daniel Buncic)
Elisabeth Seitz's paper "Digital Corpora and Databases: New Horizons
in Slavic Linguistics" read on 19.03.1998 in Ljubljana.
4. http://tractor.bham.ac.uk/tractor/catalogue.html#Russian (Daniel Buncic)
The Computer Fund of the Russian Language provided by the Russian
Academy of Sciences, available at the TRACTOR project (http://www.tractor.de/)
5. http://www.ruhr-uni-bochum.de/lilab/Dokumentation/Index-en.htm (Daniel Buncic)
Russian audiotexts and acoustic databases provided by the LiLab at the Bochum
University Seminar of Slavistics.
6. http://titus.uni-frankfurt.de/texte/texte2.htm#aruss. (Daniel Buncic)
The Old Russian texts at the TITUS project (Thesaurus
Indogermanischer Text- und Sprachmaterialien - Thesaurus of
Indoeuropean Text and Language Material, http://titus.uni-frankfurt.de/ ).
7. "Biblioteka Moshkova" at http://kulichki.com/moshkow/. (Daniel Buncic,
Ruprecht von Waldenfels)
One of most popular on-line libraries. Fictional and non-fictional texts can be downloaded.
8. http://www.slavistik.uni-bonn.de/links/volltexte.html
Link collection at Bonn University Seminar of Slavistics.
9."A Computational Phonology of Russian"(1999) of Dr. Peter Chew from
Oxford university. A morphological corpus of Russian as one of the
appendices. Available through inter-library loan. (Peter Chew)
10.the coprus used by Apresjan and his team in their work on the
dictionary of synonyms. that Corpus counts about 10 million words but is not available to
'outsiders'. (Dagmar Divjak)
11 Andrew Hippisley carried out a statistical analysis of the nouns.
Available at site the site
http://www.surrey.ac.uk/LIS/SMG/rusnoms.xls There is a readme file
explaining this dataset at the site
http://www.surrey.ac.uk/LIS/SMG/readme.html (Andrew Hippisly)
12. http://schools.keldysh.ru/uvk1838/Sciper/catalog.htm ( Serge Sharoff, Vera Fluhr-Semenova)
I'd like to advise all who are interested in Russian NLP and have
never been there to have look at the site. Rich information on Russian NLP systems.
Sciper - Societe de Conseil dans le domaine Informatique sur le Pays
de l¡¯Est et Russie (Consulting on Information Technologies of East European Countries and
Russia.)
13. http://clover.slavic.pitt.edu/~djb/slavic.html . (Ruprecht von Waldenfels)
Links.
14. An on-line morphological parser at the Sergei Starostin homepage
(http://starling.rinet.ru/) under the link "Russian dictionaries and
morphology" (http://starling.rinet.ru/morpho.htm) (Alexandre Arkhipov).
15. http://isabase.philol.msu.ru/ (Alexandre Arkhipov, Olga Krivnova)
Dr. Olga Krivnova with her colleagues are developing a system of
Russian speech synthesis. Creation of several Russian speech corpora
designed for some experimental speech recognition projects.
16. Grigori Sidorov has developed the program for Russian morphological analysis / generation
(also lemmatizing). It works with about 100,000 stems (generating about 1,500,000 wordforms).
Dictionary file size is less than 2 MB. For scientific purposes it is
free.(Available as DLL or EXE for Windows). (Grigori Sidorov)
And I'd like to add what I've found through Internet:
17. http://www.rvb.ru (Russian Virtual library)
18. http://www.philol.msu.ru/~lex/
Laboratory for general and computational lexicology and lexicography
of Moscow University. Copra of Russian newspapers and other corpora.
II. Contact information of respondents
1.Wayles Browne
Wayles Browne, Assoc. Prof. of Linguistics
Department of Linguistics
Morrill Hall 220, Cornell University
Ithaca, New York 14853, U.S.A.
tel. 607-255-0712 (o), 607-273-3009 (h)
fax 607-255-2044 (write FOR W. BROWNE)
e-mail ewb2 at cornell.edu
2.Dr. Serge Sharoff
Fakultaet fuer Linguistik und Literaturwissenschaft,
Universitaet Bielefeld,
Postfach 10 01 31, D-33501 Bielefeld, Germany,
tel: +49-521-1065275; fax: +49-521-1066447
serge.sharoff at uni-bielefeld.de
3.Daniel Buncic
Bonn University Seminar of Slavonic Philology
Lennestr. 1, D-53113 Bonn
Phone: +49 228 73-7203
Fax & answering-machine: +49 1212 515081457
E-mail: dbuncic at web.de
Homepage: http://www.uni-bonn.de/~dbuncic/
4. Peter Chew (PetChw6 at aol.com )
Oxford University
5.Dagmar Divjak (Dagmar.Divjak at arts.kuleuven.ac.be)
Ph.d. candidate in Russian linguistics.
6.Andrew Hippisley (a.hippisley at eim.surrey.ac.uk)
7.Vera Fluhr-Semenova (vera.fluhr at wanadoo.fr)
8. Ruprecht von Waldenfels (h0444tuv at student.hu-berlin.de)
9. Alexandre Arkhipov
Moscow State University
sarkipo at mail.ru
10. Olga Krivnova (okri at philol.msu.ru)
11. Grigori Sidorov, Ph.D.,
Natural Language Processing Lab,
Center for Computing Research (CIC),
National Polytechnic Institute (IPN),
Av.Juan de Dios Batiz, s/n, esq. Mendizabal, Zacatenco, CP 07738, Mexico
D.F., Mexico
Tel. +52 5729-6000, ext 56618, 56544
Fax +1 (520) 441-18-17, +52 55862936
e-mail: sidorov at cic.ipn.mx
12. Evelina G. Fedorenko (efedoren at fas.harvard.edu),a student from
Alfonso Caramazza's Lab, Havard, is also interested in our exchange of
information.
I am looking forward to further exchange of information with all.
Best regards,
Xu Hancheng
hanch-xu at jlonline.com
---------------------------------------------------------------------------
LINGUIST List: Vol-12-3165
More information about the LINGUIST
mailing list