Back to squares 0 & 1: How a computer stores different types of data

Let us begin with the meaning of the term ‘compute’. The meaning of the word means ‘to calculate’. True to its meaning, the early computers were used to break down really complex calculations and solve them in a fraction of time. With developments in processing power and the shrinkage of the size of a basic computing unit, the term ‘computing’ has built upon its humble roots and now means a lot of things. ‘Computer’s are now everywhere, from the big monsters in the back offices of NASA to the sleek mobiles in your pocket to the tiny RFID readers inside your home’s door lock and railway station turnstiles. We even have devices with fantastic brains built on artificial intelligence like face recognisers, handwriting readers and trip & timetable planners.

Despite decades of advancement, some basics remain exactly the same. The oldest and time-tested basic is the way that a computer perceives and stores data. Today we may be able to calculate the distance to the sun (number), decide whether Donald Trump or Narendra Modi will win an election or not (yes/no), read and store e-books (text data), remember and get reminders for birthdays (date), set alarms (time) and enjoy sounds, photos and videos, but at the bottom of it all, the computer sees all of these data formats in only ONE universal format: a sequence of numbers.

Electronics: the building blocks of digital data

Since a computer is an electronic device, at the very, very basic level, it can understand ONLY two states. ON or OFF. When computer stores data, it goes through a sequence of circuitry and turns them ON and OFF and while reading data back, it gathers the state of the same sequence of circuitry, thus getting a sequence of ONs and OFFs. We quantify these numerically as numbers 1 and 0 respectively. Typically, an ON state is the presence of a certain voltage, such as 3.3 volts, 5 volts, 9 volts or 12 volts, which are some standards used in electronics. Low power circuits such as LEDs can get away with 3.3 volts, whereas higher power devices such as those requiring to spin a motor need higher voltages. Applying the voltage difference across a circuit thus leads to 1 and equalising the voltage leads to 0.

Electronics and numbers

So, how do electronic states ON and OFF exactly lead to numbers, you ask. And how does it help us synthesize data? I will leave the second question for a following para, but onto the first question. As mentioned before, an ON represents number 1 and OFF represents number 0. These two digits (lets call them symbols from now on) constitute what is called the binary number system. However, two symbols by themselves are pretty useless to represent data, because, well, they are only two symbols and that is too few. So how can we represent meaningful data using just two symbols.

The importance of place value systems

Well, consider for a moment, how useful the 10 symbols (0-9) really are in their own right?  Not much, are they? But by stringing together these symbols in a sequence from left to right we increase the number of states/values that these symbols represent together. E.g. Only upto 3 symbols put together can represent 1000 different values (i.e. all the numbers from 0 to 999). Each of the 3 places can contain one of ten different symbols leading to 1000 combinations. This system of stringing symbols together to represent values is called the place value system and was devised by the Indian mathematician & astronomer Aryabhata. This powerful method devised around 500 AD is also the basis for data & computation currently in 2016 AD, one and a half millenia later!

Using number systems in computation

In computing instead of 10 different symbols, we use only 2 and we call them 0 & 1, representing a circuit which is OFF and one which is ON. By stringing them together, i.e. by building circuitry which can be written to and read from as a string of OFFs and ONs, we can have multiple combinations of 0s and 1s, e.g. 001110, 101110, thus giving us the binary number system. Each digit representing either a 0 or a 1 is called a bit. However, computers generally do not read single bits, but rather a chunk of bits together. The standard sizes are a byte, short word, word or long word, which are 8, 16, 32 or 64 bits respectively. Older computers (circa 1980s) could read only a byte at a time. However we are in the era of 64-bit computing where the processor can process 64 bits at a time leading to more possibilities of data representation in a single computation.

Let us break down that somewhat complex to understand concept. Just like symbols 0-9 (10 possible symbols) written three times can represent 1000 combinations (103) in our familiar decimal system, a series of 0s and 1s (2 possible symbols) written together 8 times can give us 256 combinations (28). A string of 16 bits can give us 65536 combinations. So if we were to represent a piece of data in 8 bits, we can only have 256 different values, whereas a piece containing 16 bits can have over 65000 different values. That is why text data in the older days supported only English and English-like languages with alphabets, whereas today’s computers support languages and symbols from across the world. With computers being able to read and write more bits per computation, the possibilities have gone up rapidly.

How computer represents different types of data as numbers

As mentioned, a computer can only understand ONs and OFFs and consequently 0s and 1s. But let me simplify today’s subject by clarifying that it is possible to convert the binary number system to the decimal number system and back. There are plenty of methods in mathematics and electronics to achieve that. For the sake of convenience, we will assume that computers understand decimal numbers (symbols 0-9) just like we do, even though computers inherently convert all decimal numbers into binary (0 & 1) and back interchangeably.

Here is how a computer represents all the common data types as numbers.

Text data

To a computer, every character in textual data represents a number to a computer. E.g. capital ‘A’ is number 65 and small ‘a’ is number 97. Who standardises these mappings between numbers and symbols. It was ASCII during the days when computers supported only alphabets and a few special characters. One text symbol was represented by 8 bits, hence leading to 256 possible symbols. However, now the world has moved onto 16-bit Unicode, thus giving possibly 65000+ symbols. These include languages like Hindi, Tamil, Korean, Arabic and the lot. Smileys, which used to be bitmap images in the 1990s and 2000s are also text characters over the last 6 years or so.

Date

We humans describe an instant of time using calendars. Most of the world uses the standard Gregorian calendar combined with the 24 hour clock. E.g. September 21, 2016 19:30:32. There are other important calendars as per different cultures. In India, the alternative calendar that I am used to is the Hindu lunar calendar and the Prahar based time. E.g. 3rd Prahar of the Chaturdashi of Shravan year 2074, the period of Kaliyug.

However, the time that a computer cares about is the number of seconds that have elapsed since what was the midnight of the first of January 1970 in London (GMT). E.g. 1474461396 seconds from that point of time represents Sep 21, 18:07:16 2016 Indian Standard Time. The same number will represent 12:37:16 of the same date in London. The 1-1-1970 00:00 starting point is called the epoch. For computing, that instant of time is the beginning of time, the big bang equivalent of computing! This representation of time is then converted by various software packages into the user’s desired calendar.

Yes / No or True / False

This is the easiest for a computer. As mentioned earlier, a computer only recognises ON/OFF. That can exactly correspond to a YES/NO, with a 1 meaning a YES and 0 meaning a NO.

Colours

Computers break down colours into 4 components. Typically colours in optics are made up of a combinations of three primary colours of lights, viz. red, green and blue. A complete lack of any of these colours shows black colour and all the three colours combined at their highest intensity shows white colour. By setting the values of red, green and blue to different levels between 0 and 255, we can obtain over 16 million different colours (16,777,216) to be precise. That’s not all. There is a 4th component called transparency / opacity. Setting the value to 0 leads to full transparency, i.e. the components behind this colour can be seen through completely or in other words, this colour will not show up at all. a value of 255 means that only this colour is seen. A value anywhere in between leads to translucency, wherein a combination of this colour and the colour behind it can be seen, similar to wearing tinted glasses.

Image

An image is a combination of dots of colours. The tiniest unit of display on a screen is called a pixel, which is typically 1/72 of an inch. Each pixel is described by a colour as described in the previous section. Thus, to represent an image, a matrix of pixels having as many rows & columns as the image’s width and height is used.

Video

Video is just a sequence of rapidly moving images. Typically anywhere between 25 to 60 images are rapidly per second, so that our eyes see varying levels of smoothness depending on the number of images per second.

Sound and music

Sound is basically described by 3 variables in digital data, i.e. time, frequency and volume. Time is exactly what it says it is and represents the running length of a sound track. At a particular instant of time, different frequencies may be playing at different volumes. We also know what volume is, i.e. loudness of sound. However, let’s look at frequencies. Frequencies represent the pitch of a sound. E.g. a flute plays really shrill, so is a high pitched sound, whereas a drum is a low pitched sound. Male voices are at a lower pitch than female voices. At any given moment, different frequencies of sound are playing at different volumes. E.g. a background drum and flute accompanying a human voice. These three frequencies and their volumes are merged together and represented by a number. 44100 such numbers are stored for one second of sound data and played out. That is a good enough number for our ears to hear bits of sound as a continuous melody.

Conclusion

From the days of white blocky looking text on black screens to today’s rich multimedia experience, computing has come a long way, but has stuck to its core. It is a wonder to see how just a combination of numbers can be perceived as an immersive experience. Studying them deeper and understanding how computers work with numbers can give us a lot of insight and help us be ‘Neo’, the chosen one from Matrix, who understands the patterns in the numbers and can even fight the system blindfolded.

[subscribe_form]