Corpora: corpora variety/summary

Vladimir Rykov, PhD in Computational Linguistics, MOCKBA rykov2000 at mail.ru
Wed Aug 30 06:36:59 UTC 2000



     Fans of tagging may skip my letter again.

     I am sending the compilation of the answers  I  got - after  many
requests to do it:

                          -----------------

Hi. Michael Barlow's pages may be a good start:
<http://www.ruf.rice.edu/~barlow/corpus.html>
You are also welcome to take a look at the CoSIH site (link below and follow
links
thereof).

http://spinoza.tau.ac.il/hci/dep/semitic/izreel.html

Good luck,
Shlomo Izre'el

                          ------------------

I can recommend the ICAME archive pages. They have all the documentation
of the ICAME CD-ROM online.
See http://www.hd.uib.no/icame/newcd.htm.

For a critical discussion of genre divisions
in corpora, there are a number of sources, such as Kesser et al.
(Proc. ACL '97) or a paper I have written together with Mathias Kirsten,
Proc. EACL '99.

Cheers, Maria

     My comments  -  I  could not get in contact with Maria or Mathias
Kirsten - esp that their publications are unavailable for me :-(. -  Vl R

                          ------------------

There is quite a good article by David Lee of Lancaster University on genre
and corpora (particularly the BNC). It can be downloaded from
http://members.xoom.com/davidlee00/downld.htm

Regards, Veronika Koller

     My comments - it is a real good article. -  Vl R

                          ------------------

The Linguistic Data Consortium's Catalog describes the 170 corpora that the
consortium
currently distributes. You can find the catalog at:
    http://www.ldc.upenn.edu/Catalog
On that page you can see various summaries of our corpora and search by data
type, data
source (broadcast news, conversation), language, recommended application and
the
sponsored research program, if any, that developed them.

Please let us know if this doesn't answer your questions. You can write to me
or to
ldc at ldc.upenn.edu for more information.

Best wishes, Chris

                          ------------------

Since I did not see any replies to Vladimir, here is an answer.
But I think that others may have much better suggestions.

You can go to the archives of this list, going through
http://linguistlist.org/ .  That is possibly the most
comprehensive source.

Old (1996) information below:

There is a site with a survey at each of these addresses:

http://www.hd.uib.no
http://www.clres.com/siglex.html
http://www.ruf.rice.edu/~barlow/corpus.html
http://www.ling.lu.se
http://clr/nmsu/edu/clr/CLR.html
http://www.cogsci.ed.ac.uk/elsenet/eci_summary.html
http://www.ids-mannheim.de/telri/telri.html
http://www.ceth.rutgers.edu

I have no idea whether any of these are still valid.

Happy search. Bill Mann

                          ------------------

     I sent my thanks to ALL the people mentioned above.

                            Vladimir Rykov

  Linguistic Institute of the RAS
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20000830/ba048e6d/attachment.htm>


More information about the Corpora mailing list