Structuring and linking data in webpages: why care and how-to

TL;DR: turns out that, even with all the machine learning hype, computers are dense when it comes to understanding what a specific data means and how it's related to the rest of the web... but you can help them - and help yourself with better SEO in the process - by structuring and linking your data with JSON-LD, RDFa and Microdata!

Prelude: computers need clarification on content

Manu Sporny does a great job at explaining the concept of linked data through his video on what is linked data?, check it out if you want to go deeper ;) If you don't, then all you need to know is that with so much content on the web, it's difficult for bots to understand what each page is talking about, leave alone the relationship between them.

For example, if a blog post cites Jair Bolsonaro and links it to an URL, the crawler will have a hard time figuring out if the link refers to the personal website of this person or, say, to a news article on what a monster this manipulator is (not related, but great read, don't let fascism win!). To help the computer, you can use one of the available syntaxes to tell computers what exactly the link is talking about, like so:

<!--
  Example done with a mixture of RFDa (vocab, typeof and property)
  and Microdata (itemscope, itemtype)
-->
<p vocab="http://schema.org/" typeof="Person">
  <span property="name">Jair Bolsonaro</span>,
  member of the <span property="memberOf">PSL party</span>,
  is a <span property="jobTitle">monstrous politician</span> with
  <a href="https://blog.post" property="url"
    itemscope itemtype="https://schema.org/NewsArticle"
  >very controversial opinions</a>.
</p>

This makes it clear that the "person" Jair Bolsonaro is a monstrous politician from the PSL party, whose URL is a news article. In this guide, I intend to show how this is relevant and how to use it, but it's far from a super complete and proven guide, so feel free to give feedback and read more from this articles' references ;)

Why care

Well, besides contributing to a better web by properly linking relevant content together, we have a very pressing and rewarding reason for structuring and linking data: SEO and social media.

Google's Knowledge Graph is based on these ideas, meaning you can follow their guidelines for better positioning your website on search results and better displaying data. Want to display your products' prices? Highlight specific links in your website? Then you'll probably have to use Rich Snippets for that!

I'm not sure about the exact benefits to SEO ranking, but by providing a more structured and complete result for users, you'll probably see an increase in click through rate. I, for one, always click on the quick links haha

OpenGraph Protocol, Facebook's way of understanding relevant content of a webpage to display in its social media platforms, is also used by Twitter and LinkedIn, and is a must-use if your links is ever going to be shared on social media, unless you want a boring link display with just a title ;)

JSON-LD vs. RDFa vs. Microdata

You can use JSON-LD, RDFa or Microdata(https://www.w3.org/TR/microdata/) to structure / link your data, and these can be used interchangeably in the same document. Feel free to explore Schema.org examples (scroll to the end of the page) and read each syntax's documentation to figure out which is best for you. Below are some considerations I have for each:

JSON-LD

JSON-LD: is a Resource Description Framework (read more about RDF on Wikipedia) based on JSON that:

  • Can be added anywhere in the document and is not tied to the markup - it fits even outside of HTML5;
  • Works great with JS and even has a JS library to build it;
  • Might make your pages a bit heavy as the syntax is quite extensive, but with gzip this is almost irrelevant;
  • I'm not sure if it's supported by all search engines;
  • You can get ready-to-use snippets at Steal Our JSON-LD;
  • By not being directly tied to the markup, it could be that search engines give less credit to it, but that's just a speculation I saw on Stack Overflow.

Example JSON-LD code:

{
  "@context": "http://schema.org",
  "@type": "BreadcrumbList",
  "itemListElement":
  [
  {
    "@type": "ListItem",
    "position": 1,
    "item":
    {
    "@id": "https://example.com/dresses",
    "name": "Dresses"
    }
  },
  {
    "@type": "ListItem",
  "position": 2,
  "item":
    {
      "@id": "https://example.com/dresses/real",
      "name": "Real Dresses"
    }
  }
  ]
}

RDFa

RDFa: another resource description framework, but comes tied to the markup:

  • Powers Facebook's OGP, is a W3C recommendation since 2015 and has been accepted by Google for a long time, so it's quite the robust syntax;
  • Personally, I find it the most pleasing to write and easy to learn - RDFa in HTML is the lite implementation, which is quite nifty;
  • Doesn't seem as flexible as JSON-LD in terms of environments it can be added to, but unlike Microdata, that only works with HTML5, RDFa can be included in SVG and XML documents.

Example RDFa code:

<ol vocab="http://schema.org/" typeof="BreadcrumbList">
  <li property="itemListElement" typeof="ListItem">
    <a property="item" typeof="WebPage"
        href="https://example.com/dresses">
      <span property="name">Dresses</span></a>
    <meta property="position" content="1">
  </li><li property="itemListElement" typeof="ListItem">
    <a property="item" typeof="WebPage"
        href="https://example.com/dresses/real">
      <span property="name">Real Dresses</span></a>
    <meta property="position" content="2">
  </li>
</ol>

Microdata

Being the only one that is not an RDF, the microdata way of doing things seems to be the least solid of the three. It looks like an implementation geared towards less technical people - which is great, easily reaching marketers and designers - but the fact that it froze for a while before becoming a W3C recommendation makes me a bit uncomfortable in using it. That said, it definitely is a trustworthy spec - even Neil Patel uses it, so feel free to use it if you vibe with the syntax.

Example microdata code:

<ol itemscope itemtype="http://schema.org/BreadcrumbList">
  <li itemprop="itemListElement" itemscope
      itemtype="http://schema.org/ListItem">
    <a itemtype="http://schema.org/WebPage"
       itemprop="item" href="https://example.com/dresses">
        <span itemprop="name">Dresses</span></a>
    <meta itemprop="position" content="1" />
  </li><li itemprop="itemListElement" itemscope
      itemtype="http://schema.org/ListItem">
    <a itemtype="http://schema.org/WebPage"
       itemprop="item" href="https://example.com/dresses/real">
      <span itemprop="name">Real Dresses</span></a>
    <meta itemprop="position" content="2" />
  </li>
</ol>

Choosing which to use

As you can see, in terms of syntax there's not much difference between RDFa and Microdata, and JSON-lD is decoupled from the markup, so my framework of thought for choosing which to use is simple:

  • Can I change the markup?
    • No: then go for JSON-LD
    • Yes: then go for RDFa. Microdata doesn't work with SVG, I find the syntax a bit less attractive and it has less maturity as an official recommendation, so RDFa seems like a more solid option.

How to incorporate this in your workflow

Well, honestly, I'm writing this article mainly to get used to these ideas and become comfortable in using it, so I don't have a simple-yet-effective solution for you... What I can advice you on is to slowly start incorporating schema into your HTML code until you get the hang of it. The Schema vocabulary is super extense and hard to understand (for example, wtf is the difference between BlogPost and BlogPosting?), and I personally still spend hours trying to figure it out every time I need to markup a simple thing, but eventually it starts to incorporate into your work.

If you're developing reusable components for your apps and websites, you can afford the luxury of creating perfect markup as it's going to payoff in other projects. Else, you don't have to be perfect, but at least mark your page with page-wide schema annotations - such as https://schema.org/Blog for your blog's homepage - to boost your SEO at least a little bit.

References