This list of tools is very interesting (thanks!). I have a related question. What tools do you use once you have downloaded the HTML files to (batch-)convert them in reasonably clean "plain" text?