In part 1, we saw how relational and non-relational databases differ in structure. Tables must have a singular, ‘unbreakable ‘ piece of information per row and column in a relational database, while non-relational databases can even contain tables inside tables (called nested data).
In this part, we will see how a relational database wreaks havoc with a piece of information so fundamental, that by the end of this post, you will start wondering why relational databases have been in use for so long! We are talking about human names today. We are talking about how a term I call ‘structural rigidity’ of relational databases caused an unnecessary and unwilling ‘standardisation’ in the way we look at human names today.
How did relational databases ‘wreak havoc’ with human names? Well, look at thousands or even tens of thousands of passports, whose owners are not happy with the way their names appear on them. The owners will say that they are now ‘stuck’ with that ‘adopted name’ for the purposes of smooth documentation and clearance. My passport happens to be one of those. This is because relational databases are rigid. But admittedly, if we try to break that rigidity, the data would be fragmented and repetitive. I will explain what I am talking about.
First, let’s look at some names and analyse them as a data engineer. Then we’ll create a relational database table to store them.
- John Scott Doe
- Rahul Madan Sharma
- Samantha Pearson Doe, who was Samantha Nick Pearson before marriage.
- Harikrishna Natrajan
- Priya Krishnamoorthy
- Lalitha Natarajan
- Lee Chun Yeap
- Jose Vicario Sanchez
- Mohammed Al Sukouri Al Wahabi, who was Mohammed Al Sukouri before marriage.
- Ananta Ramajeyanalu Guntur Krishna Vidyasagara Reddy
Still with me? Okay. In part 1, I described how storing these names directly into a table as they are is not a good idea, because there should be nothing unbreakable inside any cell in the table. So what do we do? Let’s analyse each of these names one by one.
- The first name in the list is a typical first world English speaking country male name. The United States, Australia and the UK come to mind. There is a first name, a middle name and a last name. We all know this type of name, as we have seen so many paper and online forms which request us to fill up names in this manner. Name layout: <First Name> <Middle Name> <Last Name>.
- Ditto for Rahul, who is a typical north Indian. His name is similar in layout to the first-middle-last. But being Indian, Rahul calls them ‘own name’, ‘father’s name’ and ‘surname’ respectively. But the Americanised concepts will do for him as it doesn’t affect his name’s layout in anyway. Name layout: <Own Name> <Father’s Name> <Surname>.
- Things start to get fuzzy for Samantha. Before marriage, she fit within the mould of names 1 & 2. Now married to person 1, John Doe, she has willingly changed her last name to Doe. However she also wants to keep her pre-marital last name intact and wants to call herself Samantha Pearson Doe. In effect, she has two last names, her father’s last name and her husband’s last name. Name layout: <First Name> <Father’s Last Name> <Husband’s Last Name>.
- The fourth character in the story is myself. Hailing from Tamil Nadu in India, my name consists of only my own name and my father’s name. However it did cause me a lot of trouble with forms which make last name / family name / surname mandatory, my passport application being one of them. I had to fall back to using my father’s name (Natrajan) as my last name. Not ideal, because in cultures where people are referred to by their last names, which people assume in my case to be Natrajan, they are unknowingly referring to my father and not to me! I took time getting used to that. Name layout: <Own Name> <Father’s Name>.
- The situation for Priya, my wife, is similar. She too has her own name and her father’s name (Krishnamoorthy). With no surname, she is stuck with her father’s name as her surname. However, she did herself.. and everyone a favour, by NOT changing her name after marriage. Name layout: <Own Name> <Father’s Name>.
- Not the case with my mom. She did the traditional Tamil thing by changing her name from Lalitha Krishnamurthy to Lalitha Natarajan, the second part being my father’s or her husband’s name. Name layout: <Own Name> <Husband’s Name>.
- Let’s move out of the first world and out of India into the Orient. Think China, Malaysia, Singapore, Taiwan or Korea. It is VERY important that their family names come FIRST. Lee Chun Yeap is a person whose family name is Lee and not Yeap. However, when they travel abroad, the order of the words in their names get muddled or they get called by the wrong name. So Mr Lee would wrongly be called Mr Yeap. Their names have three components, but the order is reversed. Name layout: <Family Name> <Given Name which has two parts>.
- We travel to Spain and meet Jose. Jose has two last names: Vicario and Sanchez. Vicario is his paternal family name and Sanchez is his maternal family name. That’s how all Spanish speaking countries like it, from Mexico to Argentina, from Chile to Puerto Rico. Name layout: <Own name> <Father’s family name> <Mother’s family name>.
- We travel on camels to the Arabian deserts, where we are greeted by Mohammed. His family name comes directly from the name of the nomadic tribe he was born into. However after marriage, men add the tribe name of their wives. And I mean ‘wives’ for a single man too. Polygamy is common and all those additional tribe names keep attaching themselves to the name of the man. Mohammed has ‘Al Wahabi’ from his wife’s tribe’s name. Name layout: <Own name> <Birth tribe name> <List of wives’ tribe names>.
- It is fairly difficult to analyse the name of the last character who is from Andhra Pradesh in India. His name starts with two names (Ananta and Ramajeyanalu), each bestowed by his two grandfathers (paternal and maternal). These are what they will call him! (Yes, I know. One person called multiple names. The child’ll be confused). This is followed by the name of the village that his family hails from (Guntur). Then comes the name that his parents, friends and everyone else call him by (Krishna). His father’s name is Vidyasagara and his family name is Reddy.
Our analysis of the names is complete and now as a DBA, it is our job to create a database table to hold the above info. We have two approaches: an autocratic one and an accommodative one.
An autocratic DBA (DataBase Analyst) would go by his/her rigid approach and stand his/her ground about all names having exactly 3 parts, no less, no more, displayed in the following order: <First name> <Middle name> <Last name>. This means that Priya, my mom and I will have to relegate our father’s / husband’s name as the family name and figure out some name to be a middle name. Samantha may have to pick her dad’s last name as the middle name. Mohammad will have to drop one of his tribe names. Lee will have to see his family name appear at the end. And so on. This is the state of the passports in the world today.
First all-accommodating solution
An accommodative DBA will please everyone, but taking different name components and their order of appearance into account will lead him/her to create the following relational table.
|Person ID||First Name||Order||Middle Name||Order||Husband's name||Order||Paternal Last Name||Order||Maternal Last Name||Order||Marital Last Name||Order||Given name||Order||Paternal grandfather's given name||Order||Maternal grandfather's given name||Order||Town name||Order|
|P00009||Mohammed||1||Al Sukouri||2||Al Wahabi||3|
As you can see, this table hardly looks like a well-designed table anymore. It has too many columns and is unwieldly to look at and maintain. More name components in the future will lead to more columns. Clearly this is not a smart way to handle human names. This kind of table makes us want to go back to being the autocratic DBA (described in the section above).
Let’s try again. This time we take out all the columns and put all the information about the names in different rows. Here is the result.
|Person ID||Name placeholder||Name value|
|P00003||Paternal family name||Pearson|
|P00003||Marital family name||Doe|
|P00007||Given name||Chun Yeap|
|P00008||Paternal family name||Vicario|
|P00008||Maternal family name||Sanchez|
|P00009||Birth family name||Al Sukouri|
|P00009||Wife's family name||Al Wahabi|
|P00010||Paternal grandfather's bestowed name||Ananta|
|P00010||Maternal grandfather's bestowed name||Ramajeyanalu|
Much neater. But the data looks fragmented all over the place.
- We have two clunkily named colums called ‘name placeholder’ and ‘name value’. These names look rather technical and exist only as headings for the data they represent. You can think about these columns and quickly understand that it is hard to name them in a clean human-friendly way.
- The person ID repeats on every line, increasing the number of bytes of storage required per person. The first person has the field repeated thrice, whereas the last one requires 6 repetitions. In a 1000-person table, this can quickly add up and make the table hungry for space. There will be close to 2000 ID repetitions for such a table.
The elegance of non-relational tables
In a non-relational table, we end up with:
|P00001||First name: John
Middle name: Scott
Last name: Doe
|P00002||Own name: Rahul
Father's name: Madan
|P00003||First name: Samantha
Paternal last name: Pearson
Marital last name: Doe
|P00004||Own name: Harikrishna
Father's name: Natrajan
|P00005||Own name: Priya
Father's name: Krishnamoorthy
|P00006||Own name: Lalitha
Husband's name: Natarajan
|P00007||Own name: Jose
Paternal family name: Vicario
Maternal family name: Sanchez
|P00008||Family name: Lee
Given name: Chun Yeap
|P00009||Own name: Mohammed
Birth family name: Al Sukouri
Marital family name: Al Wahabi
|P00010||Paternal grandfather's bestowed name: Ananta
Maternal grandfather's bestowed name: Ramajeyanalu
Town name: Guntur
Own name: Krishna
Father's name: Vidyasagara
Family name: Reddy
This format answers all our needs in a compact and elegant way. The name components are all grouped into a single row under the same person and there are no irrelevant column names.
As you can see, non-relational databases fit the needs of the real world more closely than relation databases do. No wonder that with modern global systems like global search engines, global social networks and global professional networks, the Information Technology world is rapidly moving away from the conditions imposed by relational databases and adopting non-relational databases.