In parts 1 and 2, we saw how relational databases need us to break up data and store it in little pieces. We specifically talked about splitting up data into different tables in part 1. In this part, we will see how we can get the different pieces of data from different tables together again and what kind of problems this merging causes.

Recall

In part 1 of this series, we ended up with two tables, which look like the following.

Contact serial number	Own name	Surname
1	Sanjaya	Chowdhury
2	Purushottama	Kulkarni

Contact serial number	Phone type	Phone number
1	Personal	9422515373
1	Work	9758085846
2	Personal	9169362975
2	Backup	9707818584

Rules mandating that only data which cannot be broken up further can be stored in relational databases forced us to make the above two tables. The first table contains a list of contact names and assigns each one an ID. The second table contains the list of phone numbers corresponding to each contact. This ensures that each table serves exactly one purpose, as per relational database rules.

Bringing the data together

It is possible to create a relational table which brings the data from the two tables together. To find out corresponding records, the procedure matches data from columns which are common to both tables. In our case, the ‘contact serial number’ acts as the glue that holds the records in the two tables together. The serial numbers are matched and a entire row of information is built. This procedure is called ‘JOIN’ing tables. Here is how the joined table looks.

Contact serial number	Own name	Surname	Phone type	Phone number
1	Sanjaya	Chowdhury	Personal	9422515373
1	Sanjaya	Chowdhury	Work	9758085846
2	Purushottama	Kulkarni	Personal	9169362975
2	Purushottama	Kulkarni	Backup	9707818584

What just happened!

You can see the problems of ‘join’ing straightaway. Contact serial number was used as the common ground between the two tables. And it appears more than once in the second table. This means that the data repeats. The number of repetitions is dictated by the table where the same data appears multiple times. But there’s worse!

The problem of combinatorics

Let’s say that we are joining 2 tables, with table 1 having 3 rows with the same data in common field and table 2 having 2 rows with the same data in the common field. The number of repetitions will be a multiplication: 3 x 2. We will see 6 rows with nearly similar data, all being the possible combinations of the rows with the common field. Let’s see an example.

Table 1: Contains a list of customer orders

Customer ID	Order date	Item	Quantity
C00001	1st December 2016	Farm Fresh Pizza	2
C00001	7th December 2016	Garlic bread sticks	5
C00001	15th December 2016	Choco Lava cake	2

Table 2: Contains a list of all possible customer delivery addresses

Customer ID	Delivery Address
C00001	Somewhere in Ghatkopar, Mumbai
C00001	Somewhere in Vashi, Mumbai

Joined:

Customer ID	Order date	Item	Quantity	Delivery address
C00001	1st December 2016	Farm Fresh Pizza	2	Somewhere in Ghatkopar, Mumbai
C00001	1st December 2016	Farm Fresh Pizza	2	Some in Vashi, Mumbai
C00001	7th December 2016	Garlic bread sticks	5	Somewhere in Ghatkopar, Mumbai
C00001	7th December 2016	Garlic bread sticks	5	Some in Vashi, Mumbai
C00001	15th December 2016	Choco Lava cake	2	Somewhere in Ghatkopar, Mumbai
C00001	15th December 2016	Choco Lava cake	2	Some in Vashi, Mumbai

Well, all that an innocent data analyst wanted was to be able to generate a report page which contains a list of all the orders by a customer and all the possible delivery addresses where he wants orders delivered. However, after joining the resulting table, which acts like a lesson in mathematical combination, makes no sense!

And also note that this confused set of records was for ONLY ONE customer C00001. Imagine what would happen with the data spanning 1000 customers!

Problem of computation and memory

Joining is an intensive process. It matches rows from multiple tables and creates new bigger rows. This consumes a lot of processor and memory. Many reports, which depend on joining tables, are set up to generated overnight or during weekends. Real-time reporting is nearly impossible, unless you have a super computer.

Problem of fragmented solutions

We saw how the joining process got confused if both the tables have common data appearing multiple times. Due to this, programmers avoid joining tables all together and querying the database multiple times in succession, each time extracting different pieces of data from different tables.Whenever we query a database, each query is held in a queue until the database is ready to get to it. Querying the database multiple times for fragments of data is tedious and can lead to programs parked in the queue for too long, leading to delays. We are better off extracting larger chunks of data with fewer queries, whenever possible, to avoid standing in the queue all day.

The elegance of NoSQL

In part 1, we already saw how tight and neat the table is for non-relational databases. Here is the non-relational example for the contacts database.

Serial Number	Name	Phones
1	Sanjaya Chowdhury	Personal: 9422515373 Work: 9758085846
2	Purushottama Kulkarni	Personal: 9169362975 Backup: 9707818584

Now let us look at how we solve the problem of customer orders and delivery addresses.

Customer ID	Orders	Addresses
C00001	( Date: 1st December 2016: ( Item: Farm Fresh Pizza Quantity: 2 ), Date: 7th December 2016: ( Item: Garlic Bread sticks, Quantity: 5 ), Date: 15th December 2016: ( Item: Choco Lava cake Quantity: 2 ) )	( Somewhere in Ghatkopar, Mumbai Somewhere in Vashi, Mumbai )

Neither is data repeated, nor will we need to query seperately for seperate fragments of info.

Conclusion

We saw how joins are useful in bringing fragments of data together in relational tables, but how they cause data repetition and confusing results. We also saw how non-relational databases’ flexibility allows us to arrange information in new ways by nesting data in data and how it avoids confusing results and repetitive querying.

Say NO to SQL: Part 3