Over the last few years, you would have noticed that GMail got smarter. If an e-mail is about a flight ticket, then GMail shows a short summary with the flight number, destination, timing and seat information. It will also offer to create a reminder for when you should leave based on your location relative to the airport. For an e-Commerce purchase, there is a summary in an invoice format, showing the item particulars and the price. How does Google know that a particular mail is about a certain topic? Has Google gotten so smart with language processing that they understand the context from mail content? Not only GMail, but even Google Search does smart things like automatically showing nutrition info when you type in the name of a food item.
Turns out that Google is smart. But not because they built a powerful language processor. But because they brought forth content creators and encouraged them to add tiny bits of metadata inside their data. Google rewards such data with SEO points and shows the data in special formats that make more sense to the user. But hold on, you cannot insert just any tidbit of metadata. This data has to conform to a specific standard so that Google’s parser can see that data for what it is and also find relations between this data and others on the Internet. Google is using what many big companies are using to identify pieces of data as specific tangible objects from the real world. The metadata that helps do this is called JSON-LD or JSON Linked Data. Adding JSON-LD to your data can help Google and other search engines know that a piece of data represents an object from the real world, such as a person, a city, a ticket, etc.
How JSON-LD came to be
When British physicist Tim Berners Lee invented the HTTP protocol and the World Wide Web, he created the means for creating digital pages of information online and links between them in the form of hyperlinks. Hyperlinks would tell your browser to jump to another document. In this way, you would start at a page and navigate a set of hyperlinks until you arrive at the page that has the information you want. The concept worked well, but unless pages were linked to from other pages, there was no way of discovering them. Then search engines came along, providing a way for pages to be discovered based on their content. However it was learnt that discovering pages this way could lead to ambiguity. For instance, searching for the word ‘white beach’ could lead to results for a white sandy beach or for the persons named White Beach.
Tim Berners Lee later used a TED talk named “The Next Web” to introduce JSON-LD and the approach for documenting data as real world objects. He put forward a method by which pieces of meta-data in JSON format could be embedded in a web page so that the web page could represent an object from the real world, along with all the page’s contents itself.
Participating search engines or data parsers could use the meta-data to feel for the type of real world object and rank the relevance of the result as well. What’s more, the search engine could then show a small tidbit of information about the object itself as described by the highest ranked page. We see how Google does this effectively nowadays as mentioned in the nutrition example.
Samples for JSON-LD
Let us look at a JSON-LD sample. This example describes the standard format for meta-data about a person.
“name”: “John Smith”,
The first line tells the parser that this JSON-LD object points to a person in the real world. The id field contains an ID in the form of a URL. The URL is not important, but it must be unique and no other object should ever have the same ID. The name of the person is in the ‘name’ field. ‘born’ contains the date of birth in the YYYY-MM-DD format. Finally, the ‘spouse’ field points to the URL of the JSON-LD object that corresponds to Alice. These links between objects are important since crawlers can gather information about data related to data and search engines can offer further suggestions.
Adding JSON-LD to your webpage and verifying
You should add JSON-LD data inside a <script> tag which is inside the <head> tag. The script tag should have the format.
<script type=”application/ld+json”> .. your JSON-LD data .. </script>.
The JSON-LD playground is a good place to parse if your metadata is formed & recognised correctly. You can put in your metadata in the textbox and the playground lets you know if there are any mistakes. Another tool which will allow you to see how Google’s own crawler sees your JSON-LD data is Google’s own testing tool called Structured Data Testing Tool. You can feed the URL of your webpage here.
Learning deeply about JSON-LD
Linked Data Tools website is a good source for learning all about JSON-LD and how to use it. The official reference of the different types of real-world objects recognised by JSON-LD and the properties of each object is on JSON-LD’s official website.
So what are you waiting for? You already have your website packed with information. Go ahead and identify what real world object each of your webpage represents. Learn all about JSON-LD and add that precious metadata in your pages and see how search engines give you brownie points eventually.
2 thoughts on “Make your data more context-aware with JSON-LD”
Nice recap of how Gmail and search engines connect the dots. I don’t typically use the date of birth in the YYYY-MM-DD format, as some clients don’t wish to have their age so publicised. It is great to see it used here. Thank you.
Thanks. That’s true, plenty of people do wish to keep their DoB private.