Thursday, August 25, 2011
Diffbot: A Modern Screenscraper
There's services out there that do screenscraping, like Kapow. Most do this work by looking internally at the html code to target the exact elements to extract from a webpage. On the other hand, Diffbot, supposedly uses even more senses to accomplish a more accurate extraction of any elements on an entire website. One way is by categorizing every webpage into one of twenty different possible categories. It also uses bots, algorithms, computer vision and artificial intelligence to process the content on the Web the way a human being can. Pretty incredible. So why does anyone need a screenscraper? Well, the new services based on Diffbot are now just starting to bloom.
Subscribe to:
Post Comments (Atom)
I have started this blog to keep track of all the interesting stuff I read about. In some cases, friends of mine have shared their cool stuff with me. I try to keep the headlines and descriptions short, so you must follow the links to read and decide for yourself if a topic is worth your time. Want to contribute? Just comment on any of the stories and I will be notified instantly.
No comments:
Post a Comment