Journey with the World Wide Web: Part 1

Every day, we intuitively type addresses of websites into our browser and within seconds get a nicely informative and designed webpage as we learn more, achieve more and get better at our life with the power of the World Wide Web.

But what exactly happens between the time that you type an address into your browser and the browser showing you the contents of a webpage. Turns out that hundreds of machines work together with clockwork co-ordination to understand what you requested, bring back the relevant information and show it to you on a page. The World Wide Web is a complex ecosystem of machines of different types that run around the clock to ensure that the website that you love so much is available to you 24/7.

Let us hop on a journey that takes you from the moment you request the URL of this page (http://www.tech101.in/journey-with-the-world-wide-web-part-1 ) in your browser to the moment when this blog post shows up on your browser.

Step 1: The browser breaks down your request URL

The browser is your window to the World Wide Web. This is where you type in your request for a website and are shown the contents. To make a request to this blog post, you request the following URL to the browser:
http://www.tech101.in/journey-with-the-world-wide-web-part-1.
A URL is like a postal address. The components of the World Wide Web are able to understand a URL just like your city or country’s postal service is able to recognise your home’s postal address. The browser is one of the components in the system and it breaks down the address in the following manner.

  • http – The user wants to request a web page using the HTTP protocol. HTTP is the method that is used for requesting and serving web sites. There are other paths of communication over the Internet, such as email, FTP for uploading and downloading, methods for streaming radio, stream video, chat and several others. Many of them are outdated and many have just sprung up. But HTTP is the biggest workhorse of the Internet since websites use that method.
  • tech101.in – The user wishes to request a web page from the tech101.in website. This part of the URL is also called the domain name. A domain name belongs to a specific company and is registered with the help of a domain name registrar like GoDaddy.
  • /journey-with-the-world-wide-web-part-1 – The user wants to get to this specific page from the website.

Step 2: The browser attempts to find which remote machine to connect to

Once the browser has found out the components of the URL, it now turns its attention to the domain name, which in our case is tech101.in.

What is IP address?

The name tech101.in is a perfectly readable name for a human, but computers do not understand names and their meanings. From their point of view, tech101.in is a domain name and it must point to a unique numerical address on the Internet which finally leads it to a single machine on the other end of the communication. The numerical address follows a certain pattern and is called an IP address. The IP address is a sequence of numbers that points to a single computer or a system of computers somewhere on the Internet and can be thought of as the equivalent of a postal code to identify regions inside cities in a country. While consumers of the postal system, i.e. us, understand addresses like Vile Parle, Mumbai, India or Perai, Penang, Malaysia, postal systems find it easier to address these areas by numbers such as 400056 or 13600. While each country has it own postal code system and format, Internet follows a single system called the Internet Protocol (IP for short) version 4. Version 6 has recently been adopted by a large section of Internet, but version 4 continues to be used more dominantly. An IPv4 address is a sequence of 4 numbers, each of which can be between 1 and 254. A lot of IP addresses are reserved for special purposes (e.g. 127.0.0.1 is the IP address that addresses the computer itself as is called the home address), but the remainder is available for use by web services and companies to address their machines. To map a domain name to an IP address, we need a special service called the Domain Name Service or DNS for short.

What is DNS?

DNS is a system of services which maintains an exhaustive list of domain names all around the world and the IP address to which each domain name points to. This is like having an exhaustive list of area names and their postal codes. The DNS list is maintained simultaneous by many levels of organisations. It starts with Registrars, the organisations that are responsible for facilitating the registration of domain names and mapping them to IP addresses. E.g. GoDaddy and Dotandco, etc. The registrars make their entries available to second level DNS services mostly maintained by Internet Service Providers, such as broadband providers for desktop computers and 3G / 4G providers for mobile phones. The names are then made available to our computers / mobile phones which are able to find out the mapping from within the browsers.

How does the browser know whom to ask for domain name to IP address mappings

As soon as you connect to your Internet service, the service provider provides the address of a machine (DNS server) that can be used to make DNS requests to resolve a domain name into an IP address. This is typically a machine maintained by your Internet Service Provider. Alternatively, it might be the address of your home’s WiFi router which passes on DNS requests to your ISP’s DNS server. Either way, the browser makes a request to the DNS server to find out the domain name to address mapping of tech101.in, which is 45.79.135.212.

The mapping from domain names to IP addresses are stored in a database in the form of DNS records and are generally created at the registrar’s end to start with, until they are propagated across the Internet. Here is a screenshot that shows my tech101.in as stored on GoDaddy.

002-dns-record-godaddy

Step 3: The seeks a connection to the remote machine

Your browser has found out the IP address for tech101.in, which is 45.79.135.212. Worldwide there is exactly ONE system of machines that maps to this IP address. But how does one find out where exactly this computer is? This is exactly where one of the most common words used in networking works its magic: Routing. Using a system of devices called routers, the browser’s connection request packet makes its way to the target machine. Let us look at routing in more detail.

What is routing?

When going from point A to point B, we take a certain path. The path will have intermediate junctions, where decisions must be taken on which fork to take next. Routing is the method by which data is led on its way from a source machine to a destination machine using a series of paths that are controlled by routers.

How does routing work?

Let us talk about a real life analogy of routing. I live in the Thane area of Mumbai in India and let’s say I need to travel to Connaught Place in New Delhi. Once I leave home, I reach the bus stop next to my home. I take the bus route 46 and set out to the next transport hub, which is the Thane railway station. Using a system of suburban / commuter trains, I make my way to Mumbai Central railway station, which is the hub that services inter-city trains. I travel by a fast express train with very few stops and reach New Delhi railway station. Alongside inter-city trains, this station is connected by light rail / metro / subway train, using which I reach Connaught place. Although this is only ONE of the several ways to commute from Mumbai to Delhi, I could have chosen to use higher speed, but costlier methods like a combination of aeroplane and Uber and skipping the cheaper mass transit systems like buses and trains.

Please take a moment to note that I never had to care about how to or in which direction to carry myself, as long as I knew the hubs to go to and caught the right vehicle to the next hub. The vehicles did it for me.

This is how it works with network routing. The data which intends to make its way from one machine to another has built in tags that let routers know where the data wants to go. The router then points the data to the next biggest router along the path, until the data reaches the destination. Just like a system of local and fast inter-city trains, the travelling data is fed into a system of medium speed copper cables to undersea high speed fibre optic cables. The routers have to balance their act between speed and cost.

003-routing

Eventually the browser’s request makes its way to the tech101 web server which hosts this blog post.

Step 4: A connection is established between the browser and the server

To understand how a connection is established, let us understand some terminology about connections first.

What is a port?

Think of a large performance hall / movie with multiple levels of seating. Based on the ticket pricing, you are entitled to sit in a area close to the stage / screen or at a level high above the stage / screen. You may even be entitled to sit in a private booth with a special one. These areas are designated with names like hallway, balcony, booth, etc. Depending on which area of the hall you have purchased a ticket for, you may be led to a different entry door. Typically one door is reserved for exactly one type of ticket and the other ticket holders are politely refused and requested to use the door designated for their type of ticket.

This is how ports work in networking. A server may host different types of services such as email, web service, streaming video, FTP, etc. But each type of service is allowed entry using a certain port. The ports are numbered and certain port numbers are reserved for certain services, e.g. port 80 is always for websites (also called HTTP communication), port 21 is always for FTP uploads and downloads, port 25 for email, port 53 for DNS communication and so on.

 

When the browser realises from the URL that it is a request to a website based on HTTP, it automatically attempts to connect to the server via port 80, which is analogous to trying to enter the server through door 80.

How a connection is established

Once the browser requests a connection to port 80, the server accepts the connection and sends back an acknowledgement response, to which the browser responds with an acknowledgement to the acknowledgement! This three-way process is called a handshake and is the core foundation of the HTTP way of doing things. The handshake process after completion, clears a path for the browser and the server to communicate with each other with more meaningful conversation. This path is called a TCP connection.

004-tcp-handshake

Step 5: The browser shoots its request

With the introduction and small talk out of the way, the browser gets straight to the point. Take a moment to refer to the broken down components of a URL from point step 1. The browser has used up the domain name tech101.in to find out which remote machine to talk to. It has likewise used up the protocol name http to find out which port to use on the server. The last part is the name of the blog post itself, i.e. journey-with-the-world-wide-web-part-1. Using a series of standard HTTP instructions, the browser requests for the blog post from the server.

Conclusion

Well, let’s call it wraps for the Part 1 of this blog post. In this part, I have made an effort to explain how the request side of the equation works. In Part 2, we shall see what goes on in the server once it receives the request and what kind of data it sends back and how the browser interprets it and renders it to the screen for you to see.

Leave a Reply

Your email address will not be published. Required fields are marked *