The Problem of Javascript and Search Engines and introducing SnapSearch

Every single application I've been building since 2 years ago have been a single page application. The server side is a REST API, and the client side is rich and handles the view and web page generation. I've been doing it so much, that I sometimes forget how to build web applications in the old static way. It's my prediction that soon every single website/web application will be using some sort of javascript, and this even includes mostly static content websites, because the advancements in javascript makes possible all sorts of new user interfaces, and interactivity in producing and consuming content on the web.

There is one big problem however. Search engines, social network robots, and friendly scrapers do not execute javascript. This means things like Google Search, Google Adverts, Bing, Facebook, Twitter, LinkedIn and more all don't work or don't work very well with javascript enhanced rich client side applications. The social networks also use bots for embedding content, such as when you share a link.

This has resulted in developers having to choose between a rich interactive application that provides new and exciting user experience, or static reliable webpages. Most businesses end up compromising with static public facing pages, and then internally a rich control panel. Or if they are starting a whole new content based site, they stick with static server side web page generation.

Now why don't the crawlers and robots support javascript execution? Well one thing is that it's difficult. You need a whole javascript runtime, and executing that is not cheap nor can it be done very fast. One of the main issues is that javascript is used to bring in asynchronous content. So the static content on a single page application is just a skeleton, the javascript is executed to fill it up with dynamic content. This means any robot/crawler has to stay on the page for a while waiting for asynchronous content to be loaded. A static HTTP request might be completed in 1 second, but waiting for asynchronous content can triple, or quadruple the time required. So it's a complex and expensive thing to do.

I and Mustafa (a student in Polycademy) set out to solve this problem. After about 9 months of development, let me introduce you to SnapSearch.

SnapSearch logo

SnapSearch is Search Engine Optimisation (SEO) for Javascript, HTML 5 and Single Page Applications. It makes your javascript applications crawlable by search engines and scrapable by social network robots. It works by intercepting requests that is sent to your web application, detects if its coming from a search engine bot or social network bot, and in turn returns a response containing a cached snapshot version of your javascript web page which it acquired by rendering your webpage previously or on the fly with automated Firefox instances.

SnapSearch provides a number of officially supported, professionally developed middleware that can be easily integrated into your current web application stack. It currently provides PHP, Node.js, Ruby and Python middlewares. They are very flexible, and allow you to specify which search engines you want to intercept for, and which you want to ignore.

It's free to get started, there's a 1000 free usages per month cap. You get nice analytics that allows you to figure out which search engines are accessing your site. There's comprehensive documentation for both the API and the middleware. Note that if you're into web scraping and you're looking for a solution to process javascript, you can use SnapSearch for this purpose too, it's very fast, in fact most on the fly renderings happen under 4 seconds, whereas cached snapshots gets returned under 1 second. SnapSearch is load balanced, so we'll scale linearly.

For a limited time offer, use "SNAP" as the code when you sign up and you'll get 2000 free usages per month cap.

The project is under heavy development, and there's still lots more to do. But it's fully functional and production ready. I'm already using it for http://dreamitapp.com/ and https://snapsearch.io/

If you're looking for help setting it all up. Feel free to hit our web chat at Hipchat. I'll be able to answer any questions and help with deployment.

If you're writing a single page application, or web pages enhanced with javascript, you don't need to make any more compromises. Just build it, and know that SEO will always be available to you. Try the demo on your web page, and you can see the difference in the content being served to search engines.

The tech savvy among you might realise this is similar to PhantomJS as a service. But it's not, because it doesn't use PhantomJS. Using real Firefox instances has the advantage of being able to take advantage of Mozilla's 6 week Firefox release cycles. This means we can support the latest in HTML5 technology quicker. The qtwebkit engine that PhantomJS relies on has a slower and sporadic release cycle. And it doesn't support plugins/extensions. SnapSearch has the possibility of adding in plugin/extensions such as Flash (this is under development).

If you're looking to get into single page application development, and you haven't jumped on the bandwagon yet, look no further than AngularJS. It's the best! Also try out Facebook's React as it can easily integrate into AngularJS's directive concept (it can be faster).

The land of client side development is changing very fast, and there is some serious engineering going on. I suggest you to also check out Polymer and Brick. Both of which AngularJS will be taking advantage of when they become more stable. Here's a comparison, you'll see that it's very similar to AngularJS directives.

Posted by CMCDragonkai on 2014-03-30 18:49:46 Tags: javascript angularjs coding seo snapsearch Want to Comment?