12.3165, Sum: Russian Corpora & Natural Language Processing

LINGUIST List linguist at linguistlist.org
Sat Dec 22 16:24:49 UTC 2001


LINGUIST List:  Vol-12-3165. Sat Dec 22 2001. ISSN: 1068-4875.

Subject: 12.3165, Sum: Russian Corpora & Natural Language Processing

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>
            Andrew Carnie, U. of Arizona <carnie at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Simin Karimi, U. of Arizona
	Terence Langendoen, U. of Arizona

Editors (linguist at linguistlist.org):
	Karen Milligan, WSU 		Naomi Ogasawara, EMU
	Jody Huellmantel, WSU		James Yuells, WSU
	Michael Appleby, EMU		Marie Klopfenstein, WSU
	Ljuba Veselinova, Stockholm U.	Heather Taylor-Loring, EMU
	Dina Kapetangianni, EMU		Richard Harvey, EMU
	Karolina Owczarzak, EMU		Renee Galvis, WSU

Software: John Remmers, E. Michigan U. <remmers at emunix.emich.edu>
          Gayathri Sriram, E. Michigan U. <gayatri at linguistlist.org>

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.



Editor for this issue: Marie Klopfenstein <marie at linguistlist.org>

=================================Directory=================================

1)
Date:  Thu, 20 Dec 2001 09:38:59 +0800
From:  Xu Hancheng <hanch-xu at jlonline.com.com>
Subject:  NLP and Russian language

-------------------------------- Message 1 -------------------------------

Date:  Thu, 20 Dec 2001 09:38:59 +0800
From:  Xu Hancheng <hanch-xu at jlonline.com.com>
Subject:  NLP and Russian language


Dear colleagues,

I posted a message to this list seeking for information about Russian
corpora and NLP in general(Linguist 12.2284).I really appreciate responses
from all of you. Unexpectedly, I have got  abundant information. This is
my summary which consists of two parts: info about Russian corpora, papers,
tools and  then contact information about respondents.


I. Russian corpora, papers,MLP tools and systems

1/ Upssala-Tuebingen corpora (Wayles Browne, Daniel Buncic, Dagmar
Divjak,  Andrew Hippisley, Ruprecht von Waldenfels)

http://www.sfb441.uni-tuebingen.de/b1/en/korpora.html
and
http://www.slaviska.uu.se/korpus.htm

http://www.sfb441.uni-tuebingen.de/tusnelda.html

A most famous Russian corpora. 1 million words. A site you'd better to see.

2. http://purl.org/net/concordance (Serge Sharoff)

An aligned corpus and tools to work with multilingual corpora.

3. http://homepages.uni-tuebingen.de/elisabeth.seitz/pub/corpora.html (Daniel Buncic)

Elisabeth Seitz's paper "Digital Corpora and Databases: New Horizons
in Slavic Linguistics" read on 19.03.1998 in  Ljubljana.

4. http://tractor.bham.ac.uk/tractor/catalogue.html#Russian (Daniel Buncic)

The Computer Fund of the Russian Language provided by the Russian
Academy of Sciences, available at the TRACTOR  project (http://www.tractor.de/)

5. http://www.ruhr-uni-bochum.de/lilab/Dokumentation/Index-en.htm (Daniel Buncic)
Russian audiotexts and acoustic databases provided by the LiLab at the Bochum
University Seminar of Slavistics.

6. http://titus.uni-frankfurt.de/texte/texte2.htm#aruss. (Daniel Buncic)

 The Old Russian texts at the TITUS project (Thesaurus
 Indogermanischer Text- und Sprachmaterialien - Thesaurus  of
 Indoeuropean Text and Language Material, http://titus.uni-frankfurt.de/ ).

7. "Biblioteka Moshkova" at http://kulichki.com/moshkow/. (Daniel Buncic,
Ruprecht von Waldenfels)
One of most popular on-line libraries. Fictional and non-fictional texts can be downloaded.

8. http://www.slavistik.uni-bonn.de/links/volltexte.html

Link collection at Bonn University Seminar of Slavistics.

9."A Computational Phonology of Russian"(1999) of Dr. Peter Chew from
Oxford university. A morphological corpus of Russian as one of the
appendices. Available through inter-library loan. (Peter Chew)

10.the coprus used by Apresjan and his team in their work on the
dictionary of synonyms. that Corpus counts about 10  million  words but is not available to
'outsiders'. (Dagmar Divjak)

11 Andrew Hippisley carried out a statistical analysis of the nouns.
Available at site the site
http://www.surrey.ac.uk/LIS/SMG/rusnoms.xls  There is a readme file
explaining this dataset at the site
http://www.surrey.ac.uk/LIS/SMG/readme.html  (Andrew Hippisly)

12. http://schools.keldysh.ru/uvk1838/Sciper/catalog.htm  ( Serge Sharoff, Vera Fluhr-Semenova)

I'd like to advise all who are interested in Russian NLP and have
never been there to have look at the site. Rich information on  Russian NLP systems.
Sciper - Societe de Conseil dans le domaine Informatique sur le Pays
de l¡¯Est et Russie (Consulting on Information Technologies of East European Countries and
Russia.)

13. http://clover.slavic.pitt.edu/~djb/slavic.html . (Ruprecht von Waldenfels)
 Links.

14. An on-line morphological parser at the Sergei Starostin homepage
(http://starling.rinet.ru/) under the link "Russian dictionaries and
morphology" (http://starling.rinet.ru/morpho.htm)  (Alexandre Arkhipov).

15.  http://isabase.philol.msu.ru/  (Alexandre Arkhipov, Olga Krivnova)
Dr. Olga Krivnova with her colleagues are developing a system of
Russian speech synthesis. Creation of several Russian speech corpora
designed for some experimental speech recognition projects.

16. Grigori Sidorov has developed the program for Russian morphological analysis / generation
(also lemmatizing). It works with about 100,000 stems (generating about 1,500,000 wordforms).
Dictionary file size is less than 2 MB. For scientific purposes it is
free.(Available as DLL or EXE for Windows).  (Grigori Sidorov)

And I'd like to add what I've  found through Internet:

17. http://www.rvb.ru  (Russian Virtual library)

18. http://www.philol.msu.ru/~lex/
Laboratory for general and computational lexicology and lexicography
of Moscow University. Copra of Russian newspapers and other corpora.



II. Contact information of respondents

1.Wayles Browne
Wayles Browne, Assoc. Prof. of Linguistics
Department of Linguistics
Morrill Hall 220, Cornell University
Ithaca, New York 14853, U.S.A.

tel. 607-255-0712 (o), 607-273-3009 (h)
fax 607-255-2044 (write FOR W. BROWNE)
e-mail ewb2 at cornell.edu

2.Dr. Serge Sharoff
Fakultaet fuer Linguistik und Literaturwissenschaft,
Universitaet Bielefeld,
Postfach 10 01 31, D-33501 Bielefeld, Germany,
tel: +49-521-1065275; fax: +49-521-1066447
serge.sharoff at uni-bielefeld.de

3.Daniel Buncic

Bonn University Seminar of Slavonic Philology
Lennestr. 1, D-53113 Bonn
Phone: +49 228 73-7203
Fax & answering-machine: +49 1212 515081457
E-mail: dbuncic at web.de
Homepage: http://www.uni-bonn.de/~dbuncic/

4. Peter Chew (PetChw6 at aol.com )
Oxford University

5.Dagmar Divjak (Dagmar.Divjak at arts.kuleuven.ac.be)
Ph.d. candidate in Russian linguistics.

6.Andrew Hippisley (a.hippisley at eim.surrey.ac.uk)

7.Vera Fluhr-Semenova (vera.fluhr at wanadoo.fr)

8. Ruprecht von Waldenfels (h0444tuv at student.hu-berlin.de)

9. Alexandre Arkhipov
Moscow State University
sarkipo at mail.ru

10. Olga Krivnova (okri at philol.msu.ru)

11. Grigori Sidorov, Ph.D.,
Natural Language Processing Lab,
Center for Computing Research (CIC),
National Polytechnic Institute (IPN),
Av.Juan de Dios Batiz, s/n, esq. Mendizabal, Zacatenco, CP 07738, Mexico
D.F., Mexico
Tel. +52 5729-6000, ext 56618, 56544
Fax +1 (520) 441-18-17, +52 55862936
e-mail: sidorov at cic.ipn.mx

12. Evelina G. Fedorenko (efedoren at fas.harvard.edu),a student from
Alfonso Caramazza's Lab, Havard, is also interested in our exchange of
information.

I am looking forward to further exchange of information with all.

Best regards,

Xu  Hancheng
hanch-xu at jlonline.com

---------------------------------------------------------------------------
LINGUIST List: Vol-12-3165



More information about the LINGUIST mailing list