Friday, January 18, 2013

Headless Browsers

If you're reading this article, you most likely know what a browser is. Now take away the GUI, and you have what's called a headless browser. Headless browsers can do all of the same things that normal browsers do, but faster. They're great for automating and testing web pages programmatically. There are a number of headless browsers in existence, and PhantomJS is the best.

In other words, the server can now act as the web user. The server-side can interact with any website  from the point of view of a site visitor. The server can submit data, click links, wait for results, and process the results. Very powerful stuff indeed. This is a step beyond web scraping.

Here is an excellent example of what you can do with a headless browser: UpstreamCommerce.com crawls your competitors' websites to compare your pricings to theirs. They may be using some other technology, but certainly using a headless browser would be one way to do it.

No comments: