eBay Product Scraping, Manta Data Scraping, Website Screen Scraping, Website Screen Scraping, Website Scraper, Scraping Data from Websites, Website Information Scraping, Web Scraping Services, Scraping Data from Websites, Website Information Scraping

Friday 22 August 2014

Scraping dynamic data


I am scraping profiles on ask.fm for a research question. The problem is that only the top most recent questions are viewable and I have to click "view more" to see the next 15.

The source code for clicking view more looks like this:

<input class="submit-button-more submit-button-more-active" name="commit" onclick="return Forms.More.allowSubmit(this)" type="submit" value="View more" />

What is an easy way of calling this 4 times before scraping it. I want the most recent 60 posts on the site. Python is preferable.

You could probably use selenium to browse to the website and click on the button/link a few times. You can get that here:

    https://pypi.python.org/pypi/selenium

Or you might be able to do it with mechanize:

    http://wwwsearch.sourceforge.net/mechanize/

I have also heard good things about twill, but never used it myself:

    http://twill.idyll.org/



Source: http://stackoverflow.com/questions/19437782/scraping-dynamic-data

No comments:

Post a Comment