Apart from browsing through websites and sending & receiving emails, what other activity do you most commonly do on the Internet? Most of you must have unanimously agreed upon instant messaging. The ability to have a text conversation with anyone across the globe, that to for free, is one of the best appeals of instant messaging or chatting. From being dinosaurs that could only speak standard English letters, chat applications have come a long way just like the rest of computing and Internet. Today, chat applications cover free voice calls, free video calls, emoticons, stickers and the ability to converse in text in any computer-supported written language in the world. In fact, chatting applications have evolved to such ubiquity that not just humans, even machines use chat to talk to each other.
Have you ever paused for a moment and thought about how this seemingly simple technology works? Let us dive deep in this post.
The components of a chat application
A chat application has the following components: a messaging application, a server and a persistent connection.
The messaging application is the part that you see. This is the part of the system that resides on your phone, laptop or personal computer as a small app. There is a text box where you type messages and another text box where all the previous messages in the conversation are shown along with the timelines. There is a button to send the messages and on the desktop / laptop, pressing the Enter / Return key will do the job. There is a trigger to open a set of emoticons that you can use. You can achieve the same using a combination of symbols that look like the same emoticon when looked sideways. Finally, for more sophisticated apps, there will be voice / video calls.
Before being ready for use, the messaging app connects to a central server. It is only because of this connection that you are able to send messages to others who are connected to the same server and others are able to see you online and send messages to you. To establish a connection, the user must first authenticate with his/her credentials.
On the server side of the equation, there is a server-side software that listens for connections from the instant messaging client. The server maintains a map of which connections belongs to which user, so that messages can be reliably relayed to the correct recipient and marked as being sent by a certain sender.
Finally, the third component is a persistent connection. This jargonistic word simply means that the instant messaging app must remain connected to the server ALL the time. A user is seen online and is able to send and receive messages ONLY as long as the connection is held stably. Internet failures, unreliable Internet connection, company firewall rules and Internet provider restrictions will often cause the connection to fail and the instant messaging app either does not work or suffers dropped messages.
Can I use one messenger to talk to another?
An important point to note is that an instant messenger can only connect with the server of the same company. You cannot usually facilitate a chat between instant messengers from different companies. A Google Hangout can only chat with another Hangout. A WhatsApp user cannot chat with a Facebook Messenger user. This is because the way Hangout encodes and encrypts its messages on the Internet cannot be understood by WhatsApp or Facebook Messenger. This is true for all the company-specific chat protocols.
However, there IS one type of chat system that allows chatting among multiple companies as long as the servers in the two companies agree to forward messages. This system is called Jabber. As long as both the companies use Jabber as the method to facilitate chats, they can configure their servers to allow messaging between their respective users.
So, while Google, Facebook, WhatsApp or Slack do not allow us to cross-communicate among each other, you can get in touch with your business partners to set up their own chat server using Jabber and then all of you can start using Jabber-based messengers. This makes it possible for a email@example.com to talk to a firstname.lastname@example.org. Chats among employees in abc.com are made possible by abc.com’s server. Ditto for xyz.com’s employees. But since both the servers are based on Jabber and both companies have set up their servers to accept messages from each other’s networks, employees between abc.com and xyz.com can cross-communicate without having to make usernames on the other company’s server.
Behind the scenes
As the first step, an instant messaging software establishes a connection with the corresponding chat server. The server notes the properties of the new connection in a map. At this point, the server has identified this new connection as an anonymous user.
An anonymous connection is usually enough for situations like Internet chat rooms, where broad topics are being discussed among a group of people. Everyone can see messages from everyone else and it is normal for messages to be tagged as anonymous. However anonymous connections are insufficient for sending direct person-to-person messages, since the server wouldn’t know on which connection to send an intended message.
So, the next step is to ask the user to identify himself/herself. This is usually done by the user having created an account for himself/herself during the sign-up process. The user now uses the credentials, i.e. username and password to authenticate. While several applications do it nowadays, WhatsApp was an early bird at using your phone number as your identity. By sending a One Time Password through SMS and asking you to verify what was sent, WhatsApp confirms that you are indeed the owner of the phone number. So the time-consuming task of choosing a username and then having to remember it are done away with.
Once you have identified yourself, the server affixes your identity to the map of connections. As more users log in, each user’s identity information is mapped to a particular connection between a messenger and the server. It is interesting to note that the part of the server that maintains connections has no idea about what a user actually is. It could be a human or another machine. All it knows is that there is map that associates an identity with a connection. If there is a message intended for a certain identity, the server looks up the map and sends the message to the corresponding connection. If this entire map collapses, the entire chat system would come crashing down and no one can chat with anyone else even though they are all logged in. So companies spend a lot of money in acquiring both hardware and software resources that can keep the map running 24 x 7 x 365 without fail. Sure, if you frequently log out and log in, then the pressure on the server would ease every now and then. But applications like WhatsApp never log out and the folks at WhatsApp have to work really hard to ensure that their map of connections is in shape all the time.
Exchange of messages
If user A sends a message intended for user B, the message first emerges from the A’s instant messenger, travels through the connection between A and the server (let’s call this connection 1234), and reaches the server. The server notes that the message has arrived from connection 1234. Since the map associates connection 1234 to user A, the server tags the sender of this message as A. The intended recipient for the message is B, so the server looks up the connection corresponding to user B. It happens to be connection 5678. The server sends the message into connection 5678 and the messages finds its way to B’s instant messenger. B’s messenger notifies B through a buzz. When B checks the messenger, she can see B’s message and also the fact that it was sent by A.
What about voice calls and video?
Voice calls and video are not directly handled by the same server as the chat server itself. They are handled by much more powerful servers with software capable of streaming and compressing audio and video. The chat server does however handle the messages that lead to the video / audio stream being set up. The intent for users to connect through voice or video is handled by the chat server itself. That is why you will see messages like “A has started a video call” or “B has accepted the voice call” as part of your chat messages. A call initiation and acceptance lead to the generation of a session ID. The messenger apps of both the parties take note of the session ID. This session ID is used by the video or audio server to create a video / audio stream with the same ID. Two channels of communcation are created, each channel handling one direction of communication. The messengers will have the software module in place to send / receive on the audio / video stream based on the session ID.
As you can see, a lot of work goes behind what feels intuitive to you. Every time your instant message is successfully read by your friend or employees and every time you happily speak over a video call with your near and dear ones half way across the globe, you have the pioneers of instant messaging and chatting to thank for it.