Second Edition Coming this Fall!

As much as I like books, they do have one major problem: Print doesn't update automatically. The good news is, I can update it manually! The second edition of Web Scraping with Python will be coming out this Fall. Currently working on the following major changes:

  • Updates of all libraries. This is especially important for BeautifulSoup, which now requires passing in an explicit HTML parser to use. 
  • Less reliance on external websites (they tend to update, move, or go away) Examples will use when at all possible. 
  • New chapters for the following topics:
    • Scrapy (updated to use Python 3! It had just a section in the first edition, but now it gets its own chapter)
    • Distributed web scraping
    • Advanced web crawling and scraping patterns -- designing large scale scrapers from basic principles. 
  • Moving code examples to IPython notebooks.

In addition, I'm implementing a more philosophic change with this edition. Previously, I tried to keep all the code samples in line with what was actually printed in the text, even if the code samples relied on old websites or out of date versions of libraries. This was done to avoid confusing readers; I wanted them to see the same code on the screen that they were reading in the book. However, this time around I'll be keeping code samples as updated and working as possible (and will be accepting pull requests!) regardless of how they might diverge from the code written in the book. 

Although the specific topics and outline is fairly set in stone at this point, I'm always willing to accept requests and feedback! Hope you enjoy.

Add new comment