<div>There are many free tools out there to scrape websites for specific content. This tutorial includes an example that is somewhat comparable:</div><a href="http://net.tutsplus.com/tutorials/javascript-ajax/web-scraping-with-node-js/">http://net.tutsplus.com/tutorials/javascript-ajax/web-scraping-with-node-js/</a><div>
<br></div><div>You might also take a look at Bobik:<div class="gmail_extra"><a href="http://usebobik.com/">http://usebobik.com/</a> </div><div class="gmail_extra"><span style="color:rgb(79,79,79);font-family:'Helvetica Neue',Helvetica,Arial,sans-serif;font-size:14px">Bobik is a cloud-powered service for scraping websites in real time. You can use any language you want as Bobik's own API is entirely HTTP-based. </span><br>
<br>Regards,</div><div class="gmail_extra">Bill Fletcher<br><br><div class="gmail_quote">On Sat, Dec 1, 2012 at 2:17 PM, Angus B. Grieve-Smith <span dir="ltr"><<a href="mailto:grvsmth@panix.com" target="_blank">grvsmth@panix.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>On 11/29/2012 10:52 PM, True Friend
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><font size="4"><font face="tahoma,sans-serif"><font size="4">I have <font size="4">a related qu<font size="4">estion:<font size="4"> News websites (these days) a<font size="4">re
using AJAX, this hides <font size="4">links while
<font size="4">simultaneously generates them via
javascript. See <a href="http://www.nation.com.pk/pakistan-news-newspaper-daily-english-online/opinions/editorials" target="_blank">this
page</a> for example<font size="4">. <font size="4">Apparently th<font size="4">is is
the archive page for all Editorials on
the newspaper website, <font size="4">but
only a few are <font size="4">shown,
and user has to click on "</font></font></font></font></font></font></font></font></font></font></font></font></font></font><font size="4"><font face="tahoma,sans-serif"><font size="4"><font size="4"><font size="4"><font size="4"><font size="4"><font size="4"><font size="4"><font size="4"><font size="4"><font size="4"><font size="4"><font size="4"><a><span>Show
more news<font size="4">" <font size="4">under the given
stories to get a few more
previous editori<font size="4">als. <font size="4">Would a<font size="4">n html
crawler be able to
bypass this and get
all links hidden on
this page?<br>
</font></font></font></font></font></span></a></font></font></font></font></font></font></font></font></font></font></font></font></font></font><br>
</div>
</blockquote>
<br>
It is possible. Certainly, anyone with enough programming skill
could write an HTML crawler that can give an AJAX website the
information it's looking for. In practice, it may be so obfuscated
that it's not worth the time and effort.<span class=""><font color="#888888"><br>
<br>
<pre cols="72">--
Angus B. Grieve-Smith
<a href="mailto:grvsmth@panix.com" target="_blank">grvsmth@panix.com</a></pre>
</font></span></div>
<br>_______________________________________________<br>
UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
<br></blockquote></div><br></div></div>