The future of the web - Node.js
In 2009, Ryan Dahl created a thing called Node.js. Fast forward 3 years and it's taken the web development world by storm. It's Github repository is the most watched repository just behind Twitter Bootstrap as of 2012 December. For those who aren't technical, Github is like an online library of (almost) all coding projects, it is used by developers to co-ordinate and collaborate with their teams, and to show off their work to potential employers and partners.
Node.js is basically Javascript written and executed on the server side. This is the backend, and it competes with the likes of PHP, Python and Ruby on Rails. This is why I think Node is so popular: (warning this blog post is going to get quite technical!)
Real Time Web or Single Page Applications
This is the one I'm most excited about. Web applications have a number of advantages over desktop applications, one of which is the ability to write once and run everywhere. One major disadvantage is that desktop applications that are downloaded and executed on your own computer, has always seemed more snappier and more events can happen at the same time. This is because the web as it stands runs on HTTP request - response model.
There is no state that is kept between requests and responses. Your computer and browser (client) sends a request to the server (my website), the server processes and outputs some content, in which it responds to your computer and then forgets you ever existed, until of course you send in another request. The connection between my server and your computer is not persistent, there is no constant communication, and hence the web is not real time and is "stateless".
Web developers innovated around this limitation by using sessions/cookies, iframes, AJAX and other interesting hacks in order to make applications seem instant and realtime. That's what you see in Facebook, where your friend's status updates are constantly pushed to your browser and you can instantly chat with your friends. 6 years ago that would have been impossible at Facebook's 1 billion user scale!
For the last few years several innovations came together in order to deal and ultimately remove this disadvantage from web applications:
Firstly AJAX improved and the Javascript implementation of it in our browsers was standardised with JQuery. This was the answer to creating "Rich Internet Applications" (RIA) otherwise known as Web 2.0.
Secondly tech entrepreneurs wanted to integrate various applications or services together, but the data used by these RIAs was in a closed loop process, they didn't share their data. So people created RESTful APIs (representational state transfer & application programming interfaces) to initiate the web services revolution. (There was a brief stint with SOAP, but SOAP sucks so most people don't use it). APIs have existed in the past to allow machine to machine communication over the web, but it required the RESTful design model to unify the protocols and data structures, in which web applications communicated with each other in order to be truly useful. This is what gives us the concept of the cloud or "mashups" or software/infrastructure/platform as a service. RESTful also supported non XML (extensible markup language) based messages, specifically JSON (Javascript Object Notation). JSON is more lightweight than XML, so it has become more popular as the data format to use when communicating through RESTful web services to clients (such as your browser) and other servers. This popularity of JSON becomes quite important to Node.js later on.
Going into the construction of RESTful architecture is a bit too much for this blog post. But you can read this excellent tutorial about a simple RESTful application. All you have to know is that RESTful web services use the standard HTTP protocols to Create, Read, Update and Delete data (CRUD). Web applications simply send a request with an accompanying HTTP method (get/post) to a unique URL in another web application/ service. That URL then returns some data or does some operation if your request was authenticated. This data is usually formatted in JSON, and it is just raw data. There's no HTML or CSS or Javascript to style it. If your browser visited that URL, it wouldn't know what to do with it. But that's the beauty of the non-styled data, machines or applications can read that data, and manipulate into using it for whatever nefarious purposes. That's why you have so many different Twitter & Facebook mashups, they all use the same API (same URLs), but they can use that data for different things. Some will combine it into a social media control panel, others will use it for analytics. The possibilities for web service mashups are endless, and many companies have been built on top this sharable data. Just take a look at Programmable Web, a news portal of new mashups. Or you can take a look at Code for Australia's example section.
Thirdly, the former omnipresent Flash died. Some may blame Apple, but I think it was just a matter of time before this proprietary multimedia platform was replaced by standards from the W3C (HTML 5). Ok I jest, Flash is not yet dead. But any company looking to the future of web applications won't be building it on Flash. Flash is still used on Youtube, but that's mainly because HTML 5 video hasn't caught up yet, but in order to satisfy iOS users, Youtube provides those videos on HTML 5 or through Apple's app. HTML 5 is not just some new markup syntax, but it is a set of protocols and application programming interfaces that gets integrated into modern browsers, (Firefox/Chrome/Opera) that allows us to create complex web applications and cross platform mobile applications like PhoneGap. The two APIs that are quite exciting and relevant to real time web is the release of Web Sockets, and the development of Web RTC.
Web Sockets allow full duplex persistent connections between the browser and the server. Previously in order to attain a semblance of real time applications, AJAX RIAs would long poll the server in order to receive updates. This kind of "comet programming" was still based on the HTTP request response model. The server could not push updates to the browser, so what they did is get the browser to constantly send requests to the server to ask if there is any new updates, and when there is, the browser retrieves the new update. This was understandably inefficient and complex to maintain, when there were many users on the application, it was quite easy to overload the server. There were other methods mentioned here. The point is, these methods were not truly realtime, and since it was still using the request response model, it introduces unnecessary latency and useless data being transferred with the HTTP Headers information. If only we could have a single persistent duplex (two way between client and server) protocol available... Well we do now. It's called web sockets, which is used to provide true real time communication between the server and the client. This allows us to build chat applications very easily and can allow multiplayer games, or collaborative environments such as Google Docs or Cloud 9.
A similar technology is being developed as we speak which is web RTC. This stands for real time communication. The RTC is designed for audio and video content, whereas web sockets is used for messages. When this is released in modern browsers, we'll be able to do peer to peer (your browser to another person's browser) real time communication. For example, you will no longer need Skype. Your browser could just hook up to web cam and then connect to another person's browser. Other applications involve intensive graphic streaming like videos or instant live streaming without going through a third party service. Your radio could replaced with just a browser. You could share files with your friend just by launching your browser. Now why can't we use web sockets for this? Well web sockets operate over TCP, and it emphasises reliability over timeliness. When you're streaming video or audio, you won't notice a missing pixel, but you will notice lag, so web RTC operates over the RTP while using the UDP protocol, this tolerates packet losses, so your information won't be corrupted just because a pixel disappeared in the vast internet while streaming your video. The web RTC is still very experimental and early stages, perhaps one day it may superseded web sockets, but it shows the direction of where the web is going, more distributed and more interconnected.
Fourthly, the rise of the single page application. This is quite well explained by Nodejitsu's blog post. The above mentioned innovations has convinced some companies and startups to create applications that operate on a client heavy architecture. Essentially, the server is now just a proxy or REST sink for some real time data storage. The user interface and user experience is handled on the client side. This would not be possible without the heavy competition between browser vendors in improving Javascript execution speed. It works like this, your browser visits a web application web site, it downloads the Javascript which contains all the necessary view templates, everything that implements your user interface and manipulation of the DOM (document object model). It downloads this once. The web application essentially lives inside your browser. As you interact with your application, it simple sends HTTP requests or socket requests (or in the future RTC requests) to the server. The server is just a REST sink or socket sink. It does not serve up any HTML, it doesn't serve up anything that would be useful to the browser alone, it simply outputs data. This data could be wrapped in JSON if it was a RESTful architecture, or it could just be socket data. The web application in your browser interprets the data and uses it to manipulate your user interface. This cuts down on unnecessary downloads, everything you need is already on your computer, the only thing that changes is the data such as your friend's status updates or real time stock market information. There is no browser reload, there was only ever 1 page you downloaded, hence the single page app.
Welcome to the real time web. The advantages of using client heavy architecture is that your entire server is just an API, and you can create clients to suit different platforms, yet still use the same server APIs. Your mobile app, your web app, your ipad app, your desktop app, your mashup all could be designed differently and have different styles, but your server doesn't care. It doesn't need to worry about all of that. It just needs to perform CRUD on the RESTful requests or socket requests. The end user, your customer doesn't care about the magic being performed and the back end, the user experience on the front end, in his/her browser is what matters. The philosophy with single page apps is to make the front end beautiful and awesome, and keep the data and any heavy CPU processing on the back end and keep it invisible and flexible.
So coming back after that whirlwind tour of the development of the web, what does this all mean for Node.js and why is it relevant?
All of the above technologies require a new paradigm in programming. They require vast amounts of scalable concurrent connections with low CPU usage. There are a number of ways to develop this model. You may have recently heard about how Twitter changed from Ruby on Rails to Scala and is now using Jetty. Scala, Erlang, Clojure, Haskell, Go language, and of course Node.js with Javascript all emphasise concurrency and asynchronous programming. But what makes Node.js so compelling is that the entry level is low, it's Javascript the most popular web programming language in existence. It runs everywhere. It runs on the browser, which is a fundamental asset to the web space. People know Javascript and the community is large, so Node.js is accessible, it's fast, it's concurrent and it's scalable. Now there is one downside to Node.js, it runs on a single thread compared to those other languages. This makes it somewhat difficult to scale to multi-core computers and run CPU intensive activities, however developers are working on extending Node.js to multiple cores as we speak! And also you should be forking any CPU intensive activities to another web service or programming language, not running on a single Node.js thread.
Wait! Let's analyse why Node.js is so suited to the real time web. Node.js works on event driven asynchronous programming. Most programming languages are procedural and imperative. This means code is evaluated in a synchronous manner, that is one by one, line by line. In order to handle concurrent connections, you need to be able to write asynchronous code. Code is evaluated simultaneously. For example, imagine a web application that is connected to 5 different separate independent web services. Perhaps a social mashup pulling in updates from multiple Facebook and Twitter clones. In synchronous programming, code is evaluated one by one, and you would need to wait for each service to return the request before moving onto the next. What happens if one of the services lag? Well your entire application hangs! Asynchronous programming allows for all the queries to be executed at the same time. It doesn't care when they return data, but they do, they run a callback which executes the event. So you can skip slow services and keep moving on, when that slow service finally returns, it catches up with a callback. Your response time for the web app mashup is reduced to slowest query rather than the combined time of all queries. This blog post explains it a bit better. When this is connected with web sockets or long polling, the end user receives the formatted data from the other 4 web services even if the 5th one is slow, whereas in synchronous he would have to wait for all of them to finish before receiving the final output, and there goes your customer!
Since JSON is the premier data format for the real time web, Node.js with Javascript integrates well with that as JSON is native to Javascript as it is the "Javascript Object Notation". No need to serialise and unserialise with other languages. Because Javascript works on both the browser and now the server, this unified front end & back end makes it very easy to share code and to have both back end and front end developers working together.
The web is becoming more and more real time. For this to happen it has to become more interconnected and social. I don't mean social in terms of human social, but social machines. Machines and applications will become more social and more distributed. Web applications and programming languages cannot be siloed in their respective communities, they have to communicate and be social with each other. Perhaps this will lead to the rise of the "Polyglot" programmer (see the resemblance?), or perhaps it will lead Skynet. Who knows? All I can say is that this field is going to get really interesting, and software is definitely eating the world.
Why am I blogging about this?
Mainly because I'm planning a major application called Polyhack, a destination for those wanting to organise hackathons, designathons and gamathons (lan parties) or any live event that enjoys participation from the audience. I was originally just going to build something simple to list all of Polycademy's events, but the more I thought about it, the more I realised that Polyhack would be useful to other people.
Polyhack will be a mashup and is intented to integrate with multiple event and location services. For example Eventbrite, Facebook, Twitter, real time chat, live streaming, location search, fundraising & promotion, gamification for hackathon attendees and prize winners, showcasing... etc. The point is I realised that Node.js fits this purpose as it involves mashing up alot of web services, but it will also have real time chat and eventually other features for when the event runs. There must be real time pushing of data and social content (videos/photos/chat/twitter messages) to all the attendees and online lurkers. It would have an open API to allow further integration into the hackathon organiser's own website or own application. Usually when one wants to organise an event like this, one would have to invest a lot of resources into making connections and marketing and logistics, I think Polyhack will help organisers make their organising simpler, and perhaps those one-off events won't need invest into their own web application to make the event real time and connected. But more importantly I felt it would be an interesting challenge to learn pure Javascript and Node.js instead of just trying to write in PHP. So I'll be documenting my efforts in learning Node.js and making Polyhack.
So what does this all mean for Polycademy's courses? Have I convinced you that Node.js is the future of web development? Shouldn't you be learning Node.js instead? Well I think Node.js is at a higher level of web development, not so much the Javascript part (since that's part of basic browser JQuery stuff), but all the associated theory and concepts one has to understand. So yes, in the future Node.js will be part of the courses, but it will require previous experience in a serverside language such as PHP/Rails/Python or Javascript front end experience. And even further down the track, one could take on even more advanced true concurrent languages such as Haskell or Clojure or Go lang. But for 99% of people and 99% of web applications, we just don't need them yet.