What I learned from reading Google Search's documentation

Working on it

Posted on June 1, 2019

Last update March 30, 2021

Reading through Google Search's developer documentation really shone a light on topics that were a bit fuzzy on my head. It also equipped me with better technical SEO skills 😉

This is a compilation of information and thoughts I gathered from reading Google Search guides, from the perspective of a web developer / designer. In the end of this article, I've prepared a TODO to go through when delivering a website, which might be helpful to you 😉

⚠ Please note: this is mostly written for myself, it's by no means an introductory and comprehensive guide. If you're new to SEO or want something more exhaustive, you should read the real deal!

Annotations on Google's documentation on SEO

General notes

By putting your users first and caring about the accessibility of the content, you're already checking most of the boxes
They don't talk much about speed here, which I found very weird 🤔
Google indexing is becoming mobile first, starting July 1st 2019. This means that it's most important than ever to optimize your site for mobile if you want to up your SEO game.
You can ping Google's servers with a sitemap URL for re-indexing
- Example: [http://www.google.com/ping?sitemap=https://hdoro.dev](http://www.google.com/ping?sitemap=URL/of/file)/sitemap.xml
Google is launching "miniapps", a new set of rich results that we can create and connect to our own API endpoints, seems really cool!
- Check out the presentation at Google I/O '19 on it
- I'm thinking it'd be great for my recipes blog as I could search by ingredients directly in the search results 🤔

Stuff that might hurt SEO

Doorway pages might hurt SEO. Gotta talk with Thiago about that
Cloaking (presenting a different version of the page for users and search engines) is an absolutely no-no
Hiding text and links (like a hidden sidebar) can be tricky
- TODO to myself: search and write about accessible and SEO-friendly off-screen menus
If working with affiliate networks, make sure to read this article on it, as having duplicate content across the network could really hurt your ranking
Based on the irrelevant keywords article, it seems like Google is smart enough to detect when people are doing "keyword stuffing". So, focus on making a good flow of content, not forcing the amount of keywords in a sentence
There are others, of course, be sure to check the Quality Guidelines

The AMP problem

Unfortunately, Google is really pushing for AMP, and there are huge chunks dedicated solely to it 😔 (tiny rant below)

I see the AMP project as Google's effort to further its hegemony on the web, so I'm definitely worried about this, as publishers feel obliged to comply with their specifications in order to keep relevant in search;
There's also the verified-look it gives to fake news, losing your brand's signature, handing content hosting to Google, etc.
If fact, there are so many problems, you can dive deep in articles such as Kill AMP before it kills the web, The problem with AMP and even A report from the AMP Advisory Committee Meeting.
I do like the idea of AMP Story, the richness of AMP Ads and the possibilities of AMP email, but I'd love scraping the project even more!

Javascript and Googlebot

If you use too much Javascript on your site(s), you should learn more about its relationship to the Googlebot:

Googlebot always declines user permission, so never count on them for critical content
It doesn't support localStorage and always clears cookies
Do some sort of feature detection and polyfill stuff that it doesn't support
- See Feature Detection on MDN
Lazy-loading components should load when scrolled into the viewport with the Insersection Observer API and a polyfill
- Although this is becoming much easier with the native browser lazy loading API 😍
Be careful with infinite-loading sections, as they should have some sort of history and pagination to allow for linking and better crawling. If you don't do so, you'd better just provide regular pagination else you'll miss out a bunch on SEO. There are probably ready-made components that do so, though 🤔
There's a concept of dynamic rendering for websites that rely on JS features that crawlers don't support. However, with Googlebot using the latest version of Chrome, I don't feel this is relevant for me. If other search engines are very important to you, you might want to check this out!

Structured data and rich results

They give a BIG emphasis to structured data
- This makes sense, as Schema really helps linking and interpreting content across the web, which is super important for search engines.
- Specifically for Google, it's what allows for rich results such as recipes breakdowns.
If you want to learn more about it, I have an introductory article on "Structuring and linking data in webpages: why care and how-to"
"Google recommends using JSON-LD for structured data whenever possible." (source) → plus, using Microdata and RDFa is a bit messy
Google explicitly recommends to use fewer, but more accurate properties in your data, than many incomplete or inconsistent ones.
Structured data must be in the content of the website, and not hidden from users, else Google might ignore it
For list pages, follow this reference on Carousels
"BlogPosting", "TechArticle" and others are more specific types of "Article", so I believe that, for Google, it doesn't seem to make a difference.
Choose the most specific types and properties possible from the Schema.org definitions
Be sure to check out the search gallery for rich result options and follow the ones you want to add. The definitions there may overwrite schema.org's, and that's okay, go with Google 😉
"You can include multiple structured data objects on a page, as long as they describe user-visible page content. However, if you mark up one item in a list you must mark up all items" (source)
- For multiple objects in JSON-LD, you can either use a @graph object in a single script, or using multiple script see more on Stack Overflow.
The only way to create a knowledge graph card on your site/company is through Google My Business
There's an "Action" markup for structured data brewing that will allow users to take specific action straight from search. This doesn't interest me as of now, so I skipped this section.
It's probably best to check Google's reference on structured data than to browse endlessly through Schema.org in search for what types and properties to use, as many of those listed in Schema aren't supported by Google. Oh, and the UX for the Schema site is horrible 😝
Some properties like reviews can make for really huge JSON objects, so I want to try adding them on client-side with JS and see if it works and it's worth the trade-off.
- However, this approach could increase computations and slow down the experience on mobile
image properties have a bunch of specificities:
- must be at least 696px wide, in the formats .png, .jpg or .gif;
- can be an array of strings, Google will pick accordingly;
- crawlable and indexable;
- image must represent the content. Not sure if this means it should be inside of the content or if Google analyzes its content 🤔

Further resources

jsonld.com provides a bunch of ready-made Schema definitions in the JSON-LD format
json-ld.org/playground/ is great for testing your schema declarations
Google Search for developers concentrates a bunch of useful stuff for us
Structured Data Codelab a step-by-step tutorial by Google to learn how to use structured data. Pretty basic, but good enough to et started.
There's a Google Webmasters Forum and a Webmasters StackExchange where you can probably answer specific questions on structure data
SchemaApp JSON-LD Schema Generator doesn't have the best user experience, but is still much better than browsing Schema.org endlessly
- This one is super complete as it fetches data from Schema's API, so it's always fresh

Duplicate content, canonical, and the likes

I finally understood what is a canonical: in case you have multiple variants of the same content in different addresses (say, you write a post on your blog and on Medium), the canonical is the main source, which Google will crawl more often. The rest are the duplicate URLs
They are generated automatically if on the same domain
Reasons for using a canonical include specifying which URL you want to show-up on search, maximizing SEO and simplifying analytics
Even when you set a specific canonical URL, Google can ultimately choose another one based on performance, content or others.
For www / non-www versions, you should go to Google Search Console and set which one is the preferred. There's no need for doing this with http / https versions, Google favors https by default.
If using hreflang for multi-language sites, you must specify a canonical URL (not sure how, though)
You can tell Google to ignore dynamic parameters (ex: example.com/blog?order=ASC). If needed, read this piece on Block crawling of parameterized duplicate content
There are many other recommendations to follow and methods to apply it in Consolidate duplicate URLs

On multi-language / region websites

You can use <link hreflang="..." tags, but this can increase the size of the HTML and be hard to manage.
You can go with HTTP headers as well, but these tends to be hard to implement for those without much knowledge on servers (myself included).
So a better alternative seems to be creating language-specific sitemaps
See the article on "Tell Google about localized versions of your page" for implementation details

TODO when delivering a seo-ready website

[ ] JSON-LD structured data with Schema definitions
- [ ] Use Google's Structure Data Testing Tool
[ ] Validate the HTML with W3C's Markup Validation Service
[ ] Run the site through a screen reader to look for accessibility bottle-necks
[ ] Test the site with Google Lighthouse, Pingdom and WebPageTest.org
[ ] Use appetize.io (or similar, such as BrowserStack) to test the site on other devices
[ ] On Google Search Console:
- [ ] if applicable, test the robots.txt file with robots.txt tester
- [ ] create a sitemap.xml and test it with the Sitemaps panel
- [ ] use the URL inspection tool to see how it shows up on Googlebot
  - [ ] Be sure to check out the mobile-friendly test as well
- [ ] Set your preferred domain (www or non-www)
- [ ] if using rich results, take a look at "Rich result status reports" after the site is indexed
[ ] If you want to dive deeper, consider going through FrontEndChecklist.io