[Corpora-List] Getting articles from newspapers to compile a corpus
Angus B. Grieve-Smith
grvsmth at panix.com
Sat Dec 1 19:17:03 UTC 2012
On 11/29/2012 10:52 PM, True Friend wrote:
> I have a related question:News websites (these days) are using AJAX,
> this hides links while simultaneously generates them via javascript.
> See this page
> <http://www.nation.com.pk/pakistan-news-newspaper-daily-english-online/opinions/editorials>
> for example. Apparently this is the archive page for all Editorials on
> the newspaper website, but only a few are shown, and user has to click
> on "Show more news" under the given stories to get a few more previous
> editorials. Would an html crawler be able to bypass this and get all
> links hidden on this page?
>
It is possible. Certainly, anyone with enough programming skill
could write an HTML crawler that can give an AJAX website the
information it's looking for. In practice, it may be so obfuscated
that it's not worth the time and effort.
--
Angus B. Grieve-Smith
grvsmth at panix.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121201/b7df8c74/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list