The Problem of Javascript and Search Engines and introducing SnapSearch

Every single application I've been building since 2 years ago have been a single page application. The server side is a REST API, and the client side is rich and handles the view and web page generation. I've been doing it so much, that I sometimes forget how to build web applications in the old static way. It's my prediction that soon every single website/web application will be using some sort of javascript, and this even includes mostly static content websites, because the advancements in javascript makes possible all sorts of new user interfaces, and interactivity in producing and consuming content on the web.

There is one big problem however. Search engines, social network robots, and friendly scrapers do not execute javascript. This means things like Google Search, Google Adverts, Bing, Facebook, Twitter, LinkedIn and more all don't work or don't work very well with javascript enhanced rich client side applications. The social networks also use bots for embedding content, such as when you share a link.

This has resulted in developers having to choose between a rich interactive application that provides new and exciting user experience, or static reliable webpages. Most businesses end up compromising with static public facing pages, and then internally a rich control panel. Or if they are starting a whole new content based site, they stick with static server side web page generation.

Now why don't the crawlers and robots support javascript execution? Well one thing is that it's difficult. You need a whole javascript runtime, and executing that is not cheap nor can it be done very fast. One of the main issues is that javascript is used to bring in asynchronous content. So the static content on a single page application is just a skeleton, the javascript is executed to fill it up with dynamic content. This means any robot/crawler has to stay on the page for a while waiting for asynchronous content to be loaded. A static HTTP request might be completed in 1 second, but waiting for asynchronous content can triple, or quadruple the time required. So it's a complex and expensive thing to do.

I and Mustafa (a student in Polycademy) set out to solve this problem. After about 9 months of development, let me introduce you to SnapSearch.

SnapSearch is Search Engine Optimisation (SEO) for Javascript, HTML 5 and Single Page Applications. It makes your javascript applications crawlable by search engines and scrapable by social network robots. It works by intercepting requests that is sent to your web application, detects if its coming from a search engine bot or social network bot, and in turn returns a response containing a cached snapshot version of your javascript web page which it acquired by rendering your webpage previously or on the fly with automated Firefox instances.

SnapSearch provides a number of officially supported, professionally developed middleware that can be easily integrated into your current web application stack. It currently provides PHP, Node.js, Ruby and Python middlewares. They are very flexible, and allow you to specify which search engines you want to intercept for, and which you want to ignore.

It's free to get started, there's a 1000 free usages per month cap. You get nice analytics that allows you to figure out which search engines are accessing your site. There's comprehensive documentation for both the API and the middleware. Note that if you're into web scraping and you're looking for a solution to process javascript, you can use SnapSearch for this purpose too, it's very fast, in fact most on the fly renderings happen under 4 seconds, whereas cached snapshots gets returned under 1 second. SnapSearch is load balanced, so we'll scale linearly.

For a limited time offer, use "SNAP" as the code when you sign up and you'll get 2000 free usages per month cap.

The project is under heavy development, and there's still lots more to do. But it's fully functional and production ready. I'm already using it for http://dreamitapp.com/ and https://snapsearch.io/

If you're looking for help setting it all up. Feel free to hit our web chat at Hipchat. I'll be able to answer any questions and help with deployment.

If you're writing a single page application, or web pages enhanced with javascript, you don't need to make any more compromises. Just build it, and know that SEO will always be available to you. Try the demo on your web page, and you can see the difference in the content being served to search engines.

The tech savvy among you might realise this is similar to PhantomJS as a service. But it's not, because it doesn't use PhantomJS. Using real Firefox instances has the advantage of being able to take advantage of Mozilla's 6 week Firefox release cycles. This means we can support the latest in HTML5 technology quicker. The qtwebkit engine that PhantomJS relies on has a slower and sporadic release cycle. And it doesn't support plugins/extensions. SnapSearch has the possibility of adding in plugin/extensions such as Flash (this is under development).

If you're looking to get into single page application development, and you haven't jumped on the bandwagon yet, look no further than AngularJS. It's the best! Also try out Facebook's React as it can easily integrate into AngularJS's directive concept (it can be faster).

The land of client side development is changing very fast, and there is some serious engineering going on. I suggest you to also check out Polymer and Brick. Both of which AngularJS will be taking advantage of when they become more stable. Here's a comparison, you'll see that it's very similar to AngularJS directives.

Posted by CMCDragonkai on 2014-03-30 18:49:46 Tags: javascript angularjs coding seo snapsearch Want to Comment?

Preloading page content like Youtube using AngularJS

I want to start blogging more about my technical journeys through Polycademy. While I was building Dream it App (http://dreamitapp.com), I noticed that there would FOUCs (flash of unstyled content) in between pages on the same application. Dream it App is a single page application, so state transitions are meant to be more fluid and closer to a desktop experience. In a traditional static page, you get FOUCs/FONCs (flash of no content) all the time between changing pages or changing sites, usually it's a white background. You'll notice this when you're on Google and you navigate to a particular new site you have been to before. But in way SPAs should be able to avoid this problem. Unfortunately for most cases, you trade the page transition white background FONC for a FOUC that has the template and styles loaded, but the data/content not loaded. This is happening because SPAs are asynchronously loading resources, so things don't come in one at a time. But then I saw Youtube's new loading mechanism.

If you've been to Youtube recently, you may have noticed a loading mechanism. Go any Youtube video, and then click on one of the related videos without opening a new tab or window. You should see this:

It looks like a loading bar. But this is not what this post will be talking about. A loading bar is fairly easy to implement in AngularJS, see the ngProgress module. What Youtube is doing is preloading all the data and content of the next video before it transitions the view state to that new video page.

So I began thinking overnight on how to achieve this. This is a difficult thing to implement. When writing an AngularJS application, you have controllers that associate themselves to various view states. A basic example would be that every single page (delineated by the URL) has a controller. Then you put all the commands that load data asynchronously from the server, and all the modifications of data into the controller. You might even abstract the process and modularise but in the end it's all activated from the controller. In most use cases, only when the page has transitioned to the new page will the controller's code run and hence bring in all of the content for the new page. So there has to be some way to load data asynchronously before activating the controller. One couldn't just run the controller before changing page, because the controller may be relying on the DOM as a dependency to run other things.

The first obvious solution would the preload resources upon the first load. In some sites, they preload images and cache them in the browser before you even load a page that needs those images. This could work as well for a small site that has fairly static content. However when you get to scaling up dynamic content it doesn't work. It does work well for templates. In fact Polycademy's current Angular architecture is to preload all templates and cache them upon the first page load. This is because templates are usually quite small when they are all combined. But this doesn't solve the problem since the content is still being loaded asynchronously and being added to the template. So when you transition state, you might see the template, but no data. What we need is a method to separate the activation of loading asynchronous content that is on demand and relevant to the specific page from the controller's code. So that upon running the controller's code (and transitioning the view), the data is there already. But this all needs to be done JIT (just in time) for the page, not for all pages.

So I did some googling. And there is a method to do this. It's called the resolve property on the routes. It is possible to configure AngularJS's routes to each have a resolve property, which calls functions that return a promise, which all have to be resolved, before finally dependency injecting them into the controller.

The methods are listed here: Delaying AngularJS route change until model loaded to prevent flicker and AngularJS - Behave like Gmail using Routes with Resolve and AngularJS Video Tutorial: Route Life Cycle

I haven't got around to testing those methods in production. But I hope it also works with the back button. Youtube does its preloading mechanism on the back button as well. At any case, this is a boon to user experience. One of the major factors of using SPAs to is to have a more responsive experience, and seeing FOUCs as you transition between views deteriorates that experience. You would never see a well designed desktop app with FOUCs in between transitions.

I also use ui-router for a more sophisticated state machine routing. And ui-router also supports resolve. So I'll be implementing this in Polycademy's next AngularJS projects.

Also for images to be resolved prior to the state change, you can use AngularJS to HTTP preload them. See this Stackoverflow post: http://stackoverflow.com/questions/18627609/angularjs-resolve-to-wait-until-external-images-have-loaded This form of eager loading for each state change is required when the layout of the next page depends on the image's size which may not be determined until is loaded. This obviously refers to dynamic or user uploaded images, not image assets of your site, which you do know what the size is beforehand.

Posted by CMCDragonkai on 2013-11-02 15:10:07 Tags: angularjs youtube javascript fouc coding Want to Comment?

Showing articles with the tag(s): 'javascript'

The Problem of Javascript and Search Engines and introducing SnapSearch

Preloading page content like Youtube using AngularJS