<div>There are many free tools out there to scrape websites for specific content.  This tutorial includes an example that is somewhat comparable:</div><a href="http://net.tutsplus.com/tutorials/javascript-ajax/web-scraping-with-node-js/">http://net.tutsplus.com/tutorials/javascript-ajax/web-scraping-with-node-js/</a><div>

<br></div><div>You might also take a look at Bobik:<div class="gmail_extra"><a href="http://usebobik.com/">http://usebobik.com/</a> </div><div class="gmail_extra"><span style="color:rgb(79,79,79);font-family:'Helvetica Neue',Helvetica,Arial,sans-serif;font-size:14px">Bobik is a cloud-powered service for scraping websites in real time. You can use any language you want as Bobik's own API is entirely HTTP-based. </span><br>

<br>Regards,</div><div class="gmail_extra">Bill Fletcher<br><br><div class="gmail_quote">On Sat, Dec 1, 2012 at 2:17 PM, Angus B. Grieve-Smith <span dir="ltr"><<a href="mailto:grvsmth@panix.com" target="_blank">grvsmth@panix.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

  
  <div bgcolor="#FFFFFF" text="#000000">

    <div>On 11/29/2012 10:52 PM, True Friend

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div dir="ltr"><font size="4"><font face="tahoma,sans-serif"><font size="4">I have <font size="4">a related qu<font size="4">estion:<font size="4"> News websites (these days) a<font size="4">re

                      using AJAX, this hides <font size="4">links while

                        <font size="4">simultaneously generates them via

                          javascript. See <a href="http://www.nation.com.pk/pakistan-news-newspaper-daily-english-online/opinions/editorials" target="_blank">this

                            page</a> for example<font size="4">. <font size="4">Apparently th<font size="4">is is

                                the archive page for all Editorials on

                                the newspaper website, <font size="4">but

                                  only a few are <font size="4">shown,

                                    and user has to click on "</font></font></font></font></font></font></font></font></font></font></font></font></font></font><font size="4"><font face="tahoma,sans-serif"><font size="4"><font size="4"><font size="4"><font size="4"><font size="4"><font size="4"><font size="4"><font size="4"><font size="4"><font size="4"><font size="4"><font size="4"><a><span>Show

                                        more news<font size="4">" <font size="4">under the given

                                            stories to get a few more

                                            previous editori<font size="4">als. <font size="4">Would a<font size="4">n html

                                                  crawler be able to

                                                  bypass this and get

                                                  all links hidden on

                                                  this page?<br>

                                                </font></font></font></font></font></span></a></font></font></font></font></font></font></font></font></font></font></font></font></font></font><br>

      </div>

    </blockquote>

    <br>

        It is possible.  Certainly, anyone with enough programming skill

    could write an HTML crawler that can give an AJAX website the

    information it's looking for.   In practice, it may be so obfuscated

    that it's not worth the time and effort.<span class=""><font color="#888888"><br>

    <br>

    <pre cols="72">-- 

Angus B. Grieve-Smith

<a href="mailto:grvsmth@panix.com" target="_blank">grvsmth@panix.com</a></pre>

  </font></span></div>


<br>_______________________________________________<br>

UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>

Corpora mailing list<br>

<a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>

<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>

<br></blockquote></div><br></div></div>