[Corpora-List] Re: Chinese-English-Russian parallel corpora:

Philip Resnik resnik at umiacs.umd.edu
Mon Jan 30 15:44:28 UTC 2006


"Olga Mitrofanova" <alkonost at OM12520.spb.edu> wrote:
> Here is a summary of useful links concerning Chinese-English-Russian =
> parallel corpora prepared by Inna Lazareva (St-Petersburg University):

Here are three more resources that might be of interest for those
interested in Chinese-English parallel text:

- The Linguist's Search Engine (http://lse.umiacs.umd.edu) provides
  access to a collection of over 118,000 Chinese pages.  These were
  mined automatically from the Web using a technique that
  automatically finds Chinese-English page pairs, which means that the
  English translation is also available when you look at a Chinese
  result.  To search Chinese collection, go to "Query Options", and
  under "Collection to Search", select "Public Collection:
  chinese_web"; then, under "Example Sentence", change "Language" from
  English to Chinese.  To see the corresponding English for a hit,
  click "Annotation".  

  The LSE Web page has links to detailed documentation.  Note that the
  Chinese pages have also been automatically classified as to level of
  document difficulty, and this "Level" can be used to narrow the
  search.

- The Linguist's Search Engine also provides English search of the
  Bible (in modern English translation).  When you click "Annotation"
  for a result, it shows the corresponding verse in dozens of other
  languages, including Chinese.

- For a collection of over 500,000 Chinese-English Web page pairs,
  mined automatically, see http://umiacs.umd.edu/~resnik/strand/ under
  the "English-Chinese (July 2003)" link.  A heavily filtered version
  of this collection was used to create the LSE's chinese_web
  collection, above.

Hope this is helpful!

  Philip

  ----------------------------------------------------------------
  Philip Resnik, Associate Professor
  Department of Linguistics and Institute for Advanced Computer Studies

  1401 Marie Mount Hall            UMIACS phone: (301) 405-6760       
  University of Maryland           Linguistics phone: (301) 405-8903
  College Park, MD 20742 USA	   Fax: (301) 314-2644 / (301) 405-7104
  http://umiacs.umd.edu/~resnik	   E-mail: resnik at umiacs.umd.edu



More information about the Corpora mailing list