October 2018 - Tech 101

Khushi loves to write children’s stories. She has written several stories in English in her country India. Muskan, a friend of Khushi, works for Samsung. She goes on a business trip to South Korea and takes her young daughter along. She has some children’s books in one of her bags. A local colleague loves the book and asks Muskan if her friend can make the books available in South Korea in Korean.

Khushi is delighted at the opportunity and is ready for a trial book tour in Seoul, where she is to showcase three trial stories to South Korean children. If they love it, there’ll be a deal with a publisher. However she has a problem. She doesn’t speak a word of Korean. The children of South Korea aren’t comfortable with English. So Khushi has two choices. She can use help, translate the three stories into Korean and take the Korean versions with her so that the children can read them on their own. Otherwise she can seek someone in Seoul who speaks both the languages fluently. Then she can read the story line-by-line in English and the interpreter can translate each line into Korean for the children at the same time.

What does children’s books have to do with our topic today? There are two types of programming languages in software. Some of them use a compiler and some use an interpreter. Using the children’s books as an example, we will learn how the two work.

Compiler: When Khushi takes a translated book to South Korea

Khushi decides to do the hard work upfront. She hires a translator on Upwork and gets the three stories translated to Korean. She can then take the Korean books to South Korea and hand them to the children, who can read the books without help. She can take the Korean books to as many Korean communities as she likes, without having to look for a translator every time.

However, Khushi faces the following issues. On one hand, she needs to keep the English versions as her source products, whereas the Korean versions are her by-products. Instead of just 3 products (the original 3 stories), Khushi needs to maintain 6 products (including the 3 translations).

If Khushi finds the need to rewrite several parts of her stories, she must discard the Korean translations and have them done again. Thus, the process of translating beforehand is too sensitive to changes.

If someone invites Khushi to Netherlands, then the Korean translations are useless. She has to find a Dutch translator who will translate her books to Dutch. That will add 3 more products to her collection, taking the total to 9.

Finally, translation is a lot of work upfront. The entire book has to be translated from start to finish so that it can be used in South Korea. The time taken for translation will depend on the length of the stories and the complexity of the sentences used.

This is how a program using a compiler behaves. Let’s assume that you write your computer program in C or C++. A computer cannot understand either of those two languages. It can understand only something called machine language. Machine languages are different for different types of computers, just like Korean is to South Korea and Dutch to Netherlands. E.g. the machine language understood by a desktop computer is different from that of a mobile phone and that of a microwave oven. The microwave oven cannot understand the machine language meant for a desktop computer, the same way a Dutchman cannot understand Korean.

Once you write your program, you need to run a compiler to translate the program into the machine language of the target machine. You need to compile one copy for each type of computing device you wish to run the program on. You need to maintain each compiled copy along with your original program in C/C++. The original one is for you to develop and grow your program in the future, whereas the compiled versions are for distribution. As long as two machines understand the same machine language, you can use the same compiled version on both. This is similar to the way that Khushi can use the Korean translation at Seoul as well as Incheon, but not at Amsterdam.

Every time you make changes to your program, you need to re-compile to all the target machine language versions, discard the older compilations and distribute the new ones to the target machines.

Interpreter: When Khushi hires an accompanying translator within South Korea

In this scenario, Khushi does not get her book translated upfront. She takes the English book with her to South Korea with the promise that she will have an interpreter accompanying her all the time. She reaches this agreement with Netherlands too. As long as the two countries keep their end of the promise, Khushi does not need to get any translation done upfront. Her luggage to both the countries includes the three stories that she wrote in English. When Khushi reaches the target community in South Korea, she pulls out her stories and starts reading them line-by-line. As soon as she finishes reading a line, the interpreter speaks the whole line in Korean, thus reaching Khushi’s audience.

Khushi can makes as many changes to the stories as she likes. She is guaranteed that her latest version will be translated line-by-line during her next story-reading session.

But life isn’t all rosy with this approach too. Khushi needs to make absolutely sure that the target country has a translator available to her. Without the accompanying translator, her English books are just useless.

Also, Khushi needs a translator every time she wants to showcase her stories to the same community or different communities inside South Korea. Just because her translator translated her story orally line-by-line doesn’t mean that anything was recorded in writing. The line-by-line translation needs to be done all over again.

Finally, take the case of a country like Kenya or Congo. These countries are poor and translation is not a lucrative job. The quality of translators is usually mediocre, with limited vocabulary. The result may not do justice to Khushi’s hard work.

This is how interpreted languages like Javascript, Python and Java work. The program in the original language is directly copied to as many target machines as required without undergoing any compilation, with the promise that the target machines have the interpreter of the required programming language installed. The programmer can make as many changes as required and immediately copy the new version to the targets, which will use the new version during the next run.

If the target device does not have an interpreter for the programming language, then the whole program is useless. There is no way to run it in that device.

The program interpreter behaves similar to Khushi’s language interpreter. It converts the program instructions to machine language line-by-line, but doesn’t note down the translations. As a result, the conversion is done every time the program is run.

A hybrid approach

What if Khushi carries the English versions to South Korea, but the interpreter also notes down the Korean version at the same that she translates line-by-line. By the time the interpreter is finished translating to the first community, there will be a newly written Korean version that Khushi can simply distribute to the other communities in the same country. Khushi doesn’t have to get everything translated upfront before leaving for South Korea. Nor does she have to take her interpreter with her everywhere. Khushi needs to remember one thing. She needs to take her interpreter along when changes are made to a story. The interpreter can then note down the Korean translation of the new version of the story when reading it for the first time to a Korean community.

Modern interpreters behave this way. They read programs line-by-line and issue the converted machine language instruction to the processor. But they also note down the conversion into a special area called the interpreter cache. As long as the source program hasn’t changed, the interpreter will take instructions from the cache rather than reading from the source language and translating yet again.

Comparison of compilers and interpreters

Here is a tabular comparison of compilers and interpreters.

A compiler needs to be installed on the machine where the program is developed.

An interpreter must be installed on every machine that the program needs to run on.

The program in the source language, e,g. C/C++, must also be maintained only in the development machine.

The program in the source language must be present on every device where it is to be run.

The source program must be converted into a machine language translation, which must be copied to every device. This must be done for every type of computing device.

The source program is copied as it is to every device where it should run. The same source is copied to all types of computing devices.

The entire program must be converted to machine code up-front before copying to the target machine and running. For a huge program such as an operating system, this may take several hours.

The interpreter picks up one line from the program and converts only that line to machine code. This is done for every line until the entire program is interpreted.

Once a program is converted to machine code and copied to the target device, no more translation takes place.

Since the program exists in source form, the interpreter must translate every line every time. But if an interpreter has a cache, then it only needs to translate once.

Which approach should I use?

There is a reason why 99% of the programming languages today are interpreter-based. The developer needs to maintain only the program in the source code and does not need to learn the tools to convert it to machine language form. The onus of converting lies with the company that writes the interpreter for the various computing platforms.

There are two reasons why a developer would want to look at writing some modules in compiled languages.

If you are running your program on a device that has very little storage and memory, it may not be possible to fit the interpreter into a low-capacity disk and then load it into limited memory. A compiled program, which is already in machine code for that device, would not need the interpreter at all. This is similar to Khushi’s situation in Kenya where good interpreters aren’t available.
If you are running a program that needs to be really, really fast, then the cost of translating every line before it is executed will catch up to you and make the experience bad. You will never see video editors or 3-D modelling apps made in Java or Python. The delay in interpretation will visible cause the frame rates to drop below 25 frames per second and you will notice what is called ‘jank’, i.e. video frames being delivered slower than what the eyes perceive as motion. As a result, video applications or applications that need real-time response are always written in compiled languages. Directly reading machine code is much faster than interpreted code.

Conclusion

Compiler and interpreters follow two different approaches, but their goal is the same. Both convert a program from it source language to something that a computing device will understand. It’s just that a compiler does it with the approach of someone setting curd from milk, i.e. starting well in advance before the product is used, whereas the interpreter follows the approach of someone who chops herbs right before tossing them into a pan, just in time, so that they don’t go stale.

Month: October 2018

Unit testing and integration testing: the X-ray scan for your software

Elements of 3D computer graphics

Compilers and Interpreters