[Corpora-List] mailing list corpora
Niels Ott
niels at drni.de
Thu Jun 15 22:10:45 UTC 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Adam ENDRODI wrote:
> Thoughts, hints? Have you run into similar problems or indeed I am
> the only one to miss the obvious?
Here's my idea:
- - Work on Usenet data.
- - Do not use archives. If you take postings from a larger
number of high traffic groups, you should easily get
your 10.000 postings.
- - Use Mozilla Thunderbird.
- - Create a Newsgroup account and subscribe to a number of
groups.
- - For each group:
- Download a lot of headers (you will be asked
when you click on the group's name for the first
time).
- Go to menu "Edit" -> "Newsgroup Properties",
click on tab "Offline", click button "Download
now".
- Wait. (This can take a while...)
- - Result: In ~/.thunderbird/<someID>/News/<newsaccountname>
you find an mbox file for each newsgroup
Best,
Niels
(Still CL Student at Tübingen Univ.)
- --
Me & Myself: http://www.drni.de/niels/
"Freedom's just another word for nothing left to lose..." (Janis Joplin)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
iD8DBQFEkdrlbosnVosUgx0RAvZKAJ9x4EvQNFo+laCSaBklQdVb9M1iLACfSPDT
ZXfiSYbJQcbyFthQ+AxYAvQ=
=3cO5
-----END PGP SIGNATURE-----
More information about the Corpora
mailing list