Thursday, August 25, 2011
Diffbot: A Modern Screenscraper
There's services out there that do screenscraping, like Kapow. Most do this work by looking internally at the html code to target the exact elements to extract from a webpage. On the other hand, Diffbot, supposedly uses even more senses to accomplish a more accurate extraction of any elements on an entire website. One way is by categorizing every webpage into one of twenty different possible categories. It also uses bots, algorithms, computer vision and artificial intelligence to process the content on the Web the way a human being can. Pretty incredible. So why does anyone need a screenscraper? Well, the new services based on Diffbot are now just starting to bloom.