[Corpora-List] Re: Chinese-English-Russian parallel corpora:

Jiangping Chen jpchen at unt.edu
Mon Jan 30 16:51:34 UTC 2006


Thanks a lot for sharing these resources. Jiangping
 
Jiangping Chen,  Ph.D.
Assistant Professor
School of Library and Information Sciences
University of North Texas
P.O. Box 311068
Denton, TX 76203
Phone: (940) 369-8393
Fax:      (940) 565-3101  

>>> Philip Resnik <resnik at umiacs.umd.edu> 01/30/06 9:44 AM >>>


"Olga Mitrofanova" <alkonost at OM12520.spb.edu> wrote:
> Here is a summary of useful links concerning Chinese-English-Russian
=
> parallel corpora prepared by Inna Lazareva (St-Petersburg
University):

Here are three more resources that might be of interest for those
interested in Chinese-English parallel text:

- The Linguist's Search Engine (http://lse.umiacs.umd.edu) provides
  access to a collection of over 118,000 Chinese pages.  These were
  mined automatically from the Web using a technique that
  automatically finds Chinese-English page pairs, which means that the
  English translation is also available when you look at a Chinese
  result.  To search Chinese collection, go to "Query Options", and
  under "Collection to Search", select "Public Collection:
  chinese_web"; then, under "Example Sentence", change "Language" from
  English to Chinese.  To see the corresponding English for a hit,
  click "Annotation".  

  The LSE Web page has links to detailed documentation.  Note that the
  Chinese pages have also been automatically classified as to level of
  document difficulty, and this "Level" can be used to narrow the
  search.

- The Linguist's Search Engine also provides English search of the
  Bible (in modern English translation).  When you click "Annotation"
  for a result, it shows the corresponding verse in dozens of other
  languages, including Chinese.

- For a collection of over 500,000 Chinese-English Web page pairs,
  mined automatically, see http://umiacs.umd.edu/~resnik/strand/ under
  the "English-Chinese (July 2003)" link.  A heavily filtered version
  of this collection was used to create the LSE's chinese_web
  collection, above.

Hope this is helpful!

  Philip

  ----------------------------------------------------------------
  Philip Resnik, Associate Professor
  Department of Linguistics and Institute for Advanced Computer
Studies

  1401 Marie Mount Hall            UMIACS phone: (301) 405-6760       
  University of Maryland           Linguistics phone: (301) 405-8903
  College Park, MD 20742 USA       Fax: (301) 314-2644 / (301)
405-7104
  http://umiacs.umd.edu/~resnik       E-mail: resnik at umiacs.umd.edu



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20060130/51a25787/attachment.htm>


More information about the Corpora mailing list