[Corpora-List] mailing list corpora

Niels Ott niels at drni.de
Thu Jun 15 22:10:45 UTC 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Adam ENDRODI wrote:
> Thoughts, hints?  Have you run into similar problems or indeed I am
> the only one to miss the obvious?

Here's my idea:

- - Work on Usenet data.
- - Do not use archives. If you take postings from a larger
  number of high traffic groups, you should easily get
  your 10.000 postings.
- - Use Mozilla Thunderbird.
- - Create a Newsgroup account and subscribe to a number of
  groups.
- - For each group:
     - Download a lot of headers (you will be asked
       when you click on the group's name for the first
       time).
     - Go to menu "Edit" -> "Newsgroup Properties",
       click on tab "Offline", click button "Download
       now".
     - Wait. (This can take a while...)
- - Result: In ~/.thunderbird/<someID>/News/<newsaccountname>
  you find an mbox file for each newsgroup

Best,

   Niels

(Still CL Student at Tübingen Univ.)

- --
Me & Myself: http://www.drni.de/niels/
"Freedom's just another word for nothing left to lose..." (Janis Joplin)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)

iD8DBQFEkdrlbosnVosUgx0RAvZKAJ9x4EvQNFo+laCSaBklQdVb9M1iLACfSPDT
ZXfiSYbJQcbyFthQ+AxYAvQ=
=3cO5
-----END PGP SIGNATURE-----



More information about the Corpora mailing list