GROQ Academy

Learn GROQ in 45 minutes

Interactive guide to take you from beginner to confidently productive with the querying language in under 1 hour.

Why learn GROQ?

GROQ (Graph-Relational Object Queries) is a brilliant querying language. Once you get a handle of it, queries are intuitive and, yet, extremely powerful and capable. I know, it's yet another thing to learn! But in the next 20-40 minutes I hope to show you that it's a valuable one to learn.

This guide will focus on how to use GROQ in the context of a Sanity.io dataset, as that's currently the most common use-case for it. With the language's open source specification, the groq-js library and projects like gatsby-plugin-groq & the Carbon static site generator, we're starting to see GROQ pick up momentum in other fronts. After using it for almost 3 years, I can't wait to see where it'll go next!

How this guide works

This is an interactive guide - you'll be running GROQ queries as you learn to maximize your understanding. Play around with them and see what's possible, it'll be worth it to grok your GROQ 😉

The data we're using is a subset of Trefle.io, an excellent database of plants of all kinds. Queries run in Sanity's CDN, which returns JSON data you can explore alongside your code.

On execution times of queries

We'll attach the end-to-end execution time of each query, which is the sum of sending the data to Sanity, waiting for it to run in their back-end, then downloading the data back in your browser. Use that number to gauge the level of complexity of each query, keeping in mind that an excessively complex query can hurt the experience of your users.

If you run the exact same query twice, you'll notice the timing is drastically reduced, that happens when the request is cached in Sanity's CDN.

If my way of explaining concepts isn't to your liking, check out the excellent GROQ of Thrones guide by Abraham A. Agung.

If something is unclear, reach out at meet@hdoro.dev or hdorodev 😊

#0 How GROQ looks like

We'll cover GROQ's syntax over the course of this guide, building understanding on each portion of it as we go. That said, I know you're curious, so here's an example query that shows what it looks like:

*[_type == "family"] {
  _id,
  common_name,
  
  "species": *[
    _type == "species" &&
    references(^._id)
  ]{
    scientific_name,
    image_url,
    edible,
    vegetable,
  }
}
| order(count(species) desc)[0..9]

Getting every document of _type family

Selecting what data we want from it

Running a sub-query to get every species associated with this family









Ordering families by their species count
And limiting to only 10 results

Now, let's start from the simplest queries and re-build the above as we go!

#1 Get all documents in a dataset

The simplest query you can make is a single character, . This star tells GROQ to pull everything from the database. As we don't want to pull all 6,000+ saved entries, let's start by getting only the first document in the database:

Query

// Get the first document of the entire database
*[0]
// Try changing this number to get a different result

// Oh, hey, I'm a JS-like comment 👋
// Get the first document of the entire database
*[0]
// Try changing this number to get a different result

// Oh, hey, I'm a JS-like comment 👋

Results

You should get a single object from the database. It's a good start, we're getting data that we could render in the front-end 🎉

Note

Notice how [0] is similar to how Javascript accesses an item in an array (["item-1","item-2"][0] -> "item-1" - see MDN's documentation). That's intentional: GROQ is very tied to JSON - JavaScript Object Notation -, and so many of the concepts and syntaxes you use there will apply here.

You'll notice we have no control over what document we get back: it just happens to be a genus (_type: "genus") called Abatia (name: "Abatia"), but what if we wanted a plant species instead?

#2 Filtering documents

Let's ensure we get the document for a plant species instead of the seemingly random genus we first received. To do that, we'll use filters, GROQ's mechanism to limit results to documents matching a given condition.

A filter is encapsulated by brackets and must return a boolean (true or false) that determines whether or not the document should be included. If true, the document will show up; if false, it won't. Here's an example:

Query

// From the whole database
*
  
// Get only those with _type == species
// (notice how _type == "species" results in a true or false value)
[_type == "species"]

// And get the first element of that
[0]

// Notice all the whitespace we used here? This is okay!
// As long as you keep the important syntax intact, make sure your query is understandable to your future self & teammates 😉
// From the whole database
*
  
// Get only those with _type == species
// (notice how _type == "species" results in a true or false value)
[_type == "species"]

// And get the first element of that
[0]

// Notice all the whitespace we used here? This is okay!
// As long as you keep the important syntax intact, make sure your query is understandable to your future self & teammates 😉

Results

Now that you got a plant species, let's take it one step further and be more specific about which plant we want. We are hungry, and so we want only edible plants 🍴🤤:

Query

*[
  // Get every species
  _type == "species" &&
  // That is edible
  edible == true
][0] // Get only the first item

// Notice the && above: it's a JS-like AND operator 😉
*[
  // Get every species
  _type == "species" &&
  // That is edible
  edible == true
][0] // Get only the first item

// Notice the && above: it's a JS-like AND operator 😉

Results

With the && filtering mechanism alone we can start getting refined results. I'll show you more advanced filters later on, but try restricting the plants above to also those which are vegetables using the vegetable property. Hint: just like the edible property, this property is a boolean and behaves the same as it.

Note

🌱 Play around: try using the > (greater than), != (different than), <= (smaller or equal to) and || (OR) operators to create novel filters and fixate your understanding.

Examples include getting only species catalogued after the year 1900, species that are either edible or vegetable. For more inspiration, refer to GROQ's operators reference documentation.

#3 Ordering documents

The query above is precise in the type of document it should fetch, but it lacks a definition of which species takes the first position. When we don't specify an order to documents, GROQ will pick whichever is closest in the dataset, which is often arbitrary and irrelevant for end users.

Let's fix that with ordering! After a selection of documents (*[_type == "species" && edible == true], in this case), add a pipe (|) and an order function with the field you want to order and what direction to follow (order(year desc)) to sort documents:

Query

*[_type == "species" && edible == true]
// Let's get the edible species that were catalogued most recently
| order(year desc)
// 👆 Try changing "desc" to "asc"

// From that ordered list, get only the first item
[0]
*[_type == "species" && edible == true]
// Let's get the edible species that were catalogued most recently
| order(year desc)
// 👆 Try changing "desc" to "asc"

// From that ordered list, get only the first item
[0]

Results

Notice how we only select the first document after we order the collection? That's because GROQ runs sequentially - in the query above:

Pick every document *
Filter that selection with our filter [_type == "species" && edible == true]
Order this sub-selection by year | order(year desc)
And only then get the first element in this ordered list

We'll get into more complex sorting further down the line.

#4 Limit and paginate results with slices

The query above could already be used to power a front-end that covers "The most recent taxonomy of an edible plant". Realistically speaking, though, we want to go further and show many results.

Slices are GROQ's way to limit the amount of results of a given query. This is useful for pagination, as well as for chunking data in a big dataset.

To perform a slice, wrap the start and end numbers between brackets and put a .. between them. For example, *[0..3] gets the entries 0, 1, 2 and 3- a total of 4 entries. Try it out:

Query

*[_type == "species" && edible == true]
| order(year desc)
[0..3] // get entries 0, 1, 2 and 3 (4 total)
*[_type == "species" && edible == true]
| order(year desc)
[0..3] // get entries 0, 1, 2 and 3 (4 total)

Results

(Optional) 2.1 Going deeper: inclusive versus non-inclusive slices

Notice how, when we use ... instead of .. in the query above, we get 3 results instead of 4. That's the difference between inclusive and non-inclusive slices. The former will include the entry corresponding to the end number index (entries[3] is included), while the latter won't (not included).

Tip

As both can achieve the same results, I highly suggest you to pick either inclusive or non-inclusive slices - whatever makes most sense to you -, and stick to it for simplicity and clarity.

Over the course of this guide, I'll use inclusive slices as that's the one I personally like the most.

Want to dive deeper into Sanity.io?

I'm starting to write a monthly newsletter with the most relevant Sanity.io content and news I can find. It'll help you keep up to date and show you what's possible without flooding your inbox 😉

#5 Get only the data you want with projections

You're probably wondering what's the use of all these properties found in each species' document - _rev, planting_spread_cm, bibliography... Not every information here is valuable for every situation.

Let's say we are making a gallery of yummy (or potentially disgusting?) edible plants. What info would we need from each, then?

common_name as a basic identifier
scientific_name to come out as smart
image_url to display its face
distributions to list where you can find it
url_wikipedia_en to learn more about it
family to categorize that plant

Let's get exactly this data and nothing else with GROQ's projections. To create one, list the properties you want, separate them by commas and wrap them in between curly braces ({ common_name, image_url, distributions }):

Query

*[_type == "species" && edible == true]
| order(year desc)
[0..3]
// From these 4 entries, get specifically:
{
  common_name,
  scientific_name,
  image_url,
  distributions,
  url_wikipedia_en,
  family,
}
*[_type == "species" && edible == true]
| order(year desc)
[0..3]
// From these 4 entries, get specifically:
{
  common_name,
  scientific_name,
  image_url,
  distributions,
  url_wikipedia_en,
  family,
}

Results

#5.1 Creating new properties in projections

Besides picking what data we want from each document, projections also allow us to calculate new values by creating new properties. Each custom property needs to be wrapped in commas ("fullBibliography") and can be the result of any GROQ query, function or operator. As an example, let's get the full bibliography of each species:

Query

*[_type == "species" && edible == true]
| order(year desc)
[0..3]
{
  common_name,
  scientific_name,
  image_url,
  distributions,
  url_wikipedia_en,
  family,
  // Calculated on the fly from a concatenation of other string properties:
  "fullBibliography": bibliography + ", " + author,
}
*[_type == "species" && edible == true]
| order(year desc)
[0..3]
{
  common_name,
  scientific_name,
  image_url,
  distributions,
  url_wikipedia_en,
  family,
  // Calculated on the fly from a concatenation of other string properties:
  "fullBibliography": bibliography + ", " + author,
}

Results

#5.2 Handling missing values

You see how some of the properties returned null for their values? If a given field requested in your projection doesn't exist in the document, GROQ will always return null.

To handle missing values, knowledge of our content model is important. It'd allow us to grasp which property can be expected and which we need to accommodate for its lack. In the example above, we know that every plant includes distributions and scientific_name, but we aren't sure about any of the others.

As an example, a PlantCard React component could look like:

const PlantCard = (props) => {
  // Fallback to displaying the scientific name if no common_name is available
  const name = props.common_name || props.scientific_name;
  return (
    <div>
      {*/ If no image, show an icon instead */}
      {props.image_url ? (
        <img src={props.image_url} alt={`${name}'s photo'`} />
      ) : <PlantIcon />}
      
      <h2>{name}</h2>
      
      {*/ Only show the scientific_name in a special manner if common name is available */}
      {props.common_name && <p>{props.scientific_name}</p>}
      
      {props.wikipedia_url && <a href={props.wikipedia_url}>Learn more</a>}
    </div>
  )
}

That said, GROQ can do a lot when it comes to conditionals. For example, let's offset the generation of the title from the front-end to the back-end:

Query

*[_type == "species" && edible == true]
| order(year desc)
[0..3]
{
  // coalesce() function selects the first non-null value
  "title": coalesce(
    common_name,
    scientific_name,
    "Untitled species"
  ),
  image_url,
  distributions,
  url_wikipedia_en,
  family,
}
*[_type == "species" && edible == true]
| order(year desc)
[0..3]
{
  // coalesce() function selects the first non-null value
  "title": coalesce(
    common_name,
    scientific_name,
    "Untitled species"
  ),
  image_url,
  distributions,
  url_wikipedia_en,
  family,
}

Results

Another issue you may have noticed is that family returns a weird object with _ref and _type, not exactly what we were looking for. This brings me to...

#6 References between documents

Relationships between your data are essential to build even the simplest content structures - for example, you can categorize articles in one or more tags, associate them with author(s) and add related articles. All of these are references.

The family field we got above is a reference field. Let's dissect its structure:

// Sample species file:
{
  "_type": "species",
  // ... (rest of the data)
  
  // The family reference 👇
  "family": {
    // _ref is the _id of the referenced document
    "_ref":"myrtaceae",
    // _type is a standardized property we need to act on references
    "_type":"reference"
  },
}

As you can see, the _ref property in a referenced is the _id of the document it's pointing to. Here's what the myrtaceae document looks like:

{
  "_createdAt": "2021-04-12T19:17:37Z",
  "_id": "myrtaceae",
  "_rev": "1NSs9ylFOEDq2BuVxBXdBs",
  "_type": "family",
  "_updatedAt": "2021-04-12T19:17:37Z",
  "common_name": "Myrtle family",
  "name": "Myrtaceae"
}

So how do we get the family's name or common_name in the species data? We need to expand its reference - getting the document associated with that _ref together with the rest of the data referencing it. The simplest way to do that is with an right arrow (->) - think of it as "hey GROQ, follow this reference and fetch me this document, please":

Query

*[_type == "species" && edible == true]
| order(year desc)
[0..3]
{
  "title": coalesce(common_name, scientific_name, "Untitled species"),
  image_url,
  distributions,
  url_wikipedia_en,
  // 👇👇👇 See the arrow here? We'll get the full family document ✨
  family->,
}
*[_type == "species" && edible == true]
| order(year desc)
[0..3]
{
  "title": coalesce(common_name, scientific_name, "Untitled species"),
  image_url,
  distributions,
  url_wikipedia_en,
  // 👇👇👇 See the arrow here? We'll get the full family document ✨
  family->,
}

Results

What is this magical arrow doing? It's a simplified way - or synctactic sugar - to get the one document referring to that _ref. If we were to explicitly write that, here's what that would look like:

Query

*[_type == "species" && edible == true]
| order(year desc)
[0..3]
{
  "title": coalesce(common_name, scientific_name, "Untitled species"),
  image_url,
  distributions,
  url_wikipedia_en,
  "family-magic": family->,
  // "From the whole dataset, get the one document with the _id of family._ref"
  "family-manual": *[_id == ^.family._ref][0],
  // Notice the ^ above? That's selecting the top-level document, in this case the species we're in
}
*[_type == "species" && edible == true]
| order(year desc)
[0..3]
{
  "title": coalesce(common_name, scientific_name, "Untitled species"),
  image_url,
  distributions,
  url_wikipedia_en,
  "family-magic": family->,
  // "From the whole dataset, get the one document with the _id of family._ref"
  "family-manual": *[_id == ^.family._ref][0],
  // Notice the ^ above? That's selecting the top-level document, in this case the species we're in
}

Results

This is where things can start to get confusing. I know it's tempting to stick to family-> and forget about the manual expansion of the reference, but here's why it's useful to learn how this works:

It shows how you can do nested queries in each level of your query. This is where GROQ's power really shines, and we'll get to an example in a bit.
It reminds us that every part of GROQ is modular: you can plug and play functions, filters, orders and more into every part of your query
And it's also a way to extend the default expansion behavior.
- For example, if we want to expand the family only if it has a common_name property, we'd need to reach for the manual approach. Hint, it'd look like this: "family": *[_id == ^.family._ref && defined(common_name)][0] - if the associated document has no common_name defined, family would return null

Exercise

Play around: try expanding the genus reference in species to get the associated genus document

Note

I'm not particularly confident in my explanations of this section, so please let me know if it could be improved upon o/

We got the family of the individual species, now how do we get every species for a given family? References flow both ways: we can use GROQ's capabilities to get every document referencing the current one with the references() function. Let's query data to display a list of 10 families:

Query

// Get every family
*[_type == "family"]{
  _id,
  "title": coalesce(common_name, scientific_name),
  // List species referencing each family
  "species": *[
    _type == "species" &&
    // ^ refers to the top-level - in this case, the current family
    // So ^._id refers to _id of this family
    references(^._id)
  ]{
    // From each species, let's get only what we want
    scientific_name,
    image_url,
  }
}
// Get only 10 for now:
[0..9]
// Get every family
*[_type == "family"]{
  _id,
  "title": coalesce(common_name, scientific_name),
  // List species referencing each family
  "species": *[
    _type == "species" &&
    // ^ refers to the top-level - in this case, the current family
    // So ^._id refers to _id of this family
    references(^._id)
  ]{
    // From each species, let's get only what we want
    scientific_name,
    image_url,
  }
}
// Get only 10 for now:
[0..9]

Results

See how we used a nested query, or a sub-query or inner-join, to get all species related to each family? This opens up all sorts of possibilities which we'll explore in the advanced queries section 🤩

Tip

The references() function searches for references in the whole document. If our species content model had references to families outside of the desired family property, we'd need another approach.

We'd be explicit about exactly where GROQ should look for the reference by doing:

*[_type == "family"]{
  // List species referencing each family
  "species": *[
    _type == "species" &&
    // add the species if its family's _ref is the same as the current family's _id
    family._ref == ^._id
  ]
}

💡 Bonus: as GROQ doesn't need to go through the whole document to find references, this method is also more performant. It may be a good optimization to try in your queries.

#7 Chaining & execution order

Getting species associated with each family is nice, but what if we want to show the families with most species first? As we get species in a nested query, the simple order() functions we were doing won't cut it. Unless... we chain our operations!

Remember in section #3 above when we said GROQ queries run sequentially? We can use this to our advantage and only order families after we pick their data in a projection. This would then expose that data to the order function, which we could use to order by species before limiting the final results with our [0..9] slice. Confusing in text, clearer in code:

Query

*[_type == "family"]{
  _id,
  "title": coalesce(common_name, scientific_name),
  // List species referencing each family
  "species": *[
    _type == "species" &&
    references(^._id)
  ]{
    scientific_name,
    image_url,
  }
}
// Now that the projection above is concluded, we have access to "title" and "species" specified above
// Let's use that in our order 👇
| order(count(species) desc)

// Only after we're done ordering will we slice the end result
[0..9]
*[_type == "family"]{
  _id,
  "title": coalesce(common_name, scientific_name),
  // List species referencing each family
  "species": *[
    _type == "species" &&
    references(^._id)
  ]{
    scientific_name,
    image_url,
  }
}
// Now that the projection above is concluded, we have access to "title" and "species" specified above
// Let's use that in our order 👇
| order(count(species) desc)

// Only after we're done ordering will we slice the end result
[0..9]

Results

Note

Notice the count() function above? It takes an array as input and returns the number of items in that list. That's how we're ranking families by their species count.

You can find all of GROQ's functions here.

Another use-case of chaining / sequential execution is simplifying projections' logic and speeding up execution times. Say we not only want the biggest families (most species), but also the yummiest ones - those with the most edible and vegetable species. Here's how I'd approach that:

*[_type == "family"] {
  _id,
  common_name,
  "species": *[
    _type == "species" &&
    family._ref == ^._id
  ]{
    scientific_name,
    image_url,
    edible,
    vegetable,
  }
}

// Second projection - enhance that data
{
  // "spread" the data we already got to add it to the final result
  ...,
  "numberOfEdible": count(species[edible == true]),
  "numberOfVegetable": count(species[vegetable == true]),
}

// Third projection
{
  ..., 
  "yummyScore": (
    // 2 points for each edible species
    (numberOfEdible * 2) +
    // 1 point for each vegetable species
    (numberOfVegetable * 1)
  )
}
| order(yummyScore desc, count(species) desc)
[0..9]



Start by getting the list of species for each family, as we did above







Also include the edible and vegetable properties as they'll come in handy




Then, in a **second projection**, we have access to the newly-created species list
Let's use that to calculate the number of edible and vegetable species


To get `numberOfEdible`, we start by filtering only the edible species: species[edible == true]


In a third and last projection, we'll use these counts to calculate yummyScore


The score will be calculated as numberOfEdible * 2 + numberOfVegetable * 1






With both projections concluded, let's order results.
Here we're doing **MULTI-FIELD ORDERING**, meaning we first order by yummyScore, then count of species

Here are the main takeaways from this query:

Filters are available in every array, be it a list of documents or a nested array in a given entry - another masterpiece of GROQ's modularity
We can run as many projections as we like, each building on the data of the previous
You can do multi-level ordering to use other fields as a means to order fields with ties in the first order value
- In the example above (order(yummyScore desc, count(species) desc)): order families by their yummyScore and, if two or more have the same score, order by the count of species.

Play with chaining elements of a query below:

Query

*[_type == "family"] {
  _id,
  common_name,
  "species": *[
    _type == "species" &&
    family._ref == ^._id
  ]{
    scientific_name,
    image_url,
    edible,
    vegetable,
  }
}

// Second projection - enhance that data
{
  // "spread" the data we already got to add it to the final result
  ...,
  "numberOfEdible": count(species[edible == true]),
  "numberOfVegetable": count(species[vegetable == true]),
}

// Third projection
{
  ..., 
  "yummyScore": (
    // 2 points for each edible species
    (numberOfEdible * 2) +
    // 1 point for each vegetable species
    (numberOfVegetable * 1)
  )
}
| order(yummyScore desc, count(species) desc)
[0..9]

// Second projection - enhance that data
{
  // "spread" the data we already got to add it to the final result
  ...,
  "numberOfEdible": count(species[edible == true]),
  "numberOfVegetable": count(species[vegetable == true]),
}

// Third projection
{
  ..., 
  "yummyScore": (
    // 2 points for each edible species
    (numberOfEdible * 2) +
    // 1 point for each vegetable species
    (numberOfVegetable * 1)
  )
}
| order(yummyScore desc, count(species) desc)
[0..9]
*[_type == "family"] {
  _id,
  common_name,
  "species": *[
    _type == "species" &&
    family._ref == ^._id
  ]{
    scientific_name,
    image_url,
    edible,
    vegetable,
  }
}

// Second projection - enhance that data
{
  // "spread" the data we already got to add it to the final result
  ...,
  "numberOfEdible": count(species[edible == true]),
  "numberOfVegetable": count(species[vegetable == true]),
}

// Third projection
{
  ..., 
  "yummyScore": (
    // 2 points for each edible species
    (numberOfEdible * 2) +
    // 1 point for each vegetable species
    (numberOfVegetable * 1)
  )
}
| order(yummyScore desc, count(species) desc)
[0..9]

// Second projection - enhance that data
{
  // "spread" the data we already got to add it to the final result
  ...,
  "numberOfEdible": count(species[edible == true]),
  "numberOfVegetable": count(species[vegetable == true]),
}

// Third projection
{
  ..., 
  "yummyScore": (
    // 2 points for each edible species
    (numberOfEdible * 2) +
    // 1 point for each vegetable species
    (numberOfVegetable * 1)
  )
}
| order(yummyScore desc, count(species) desc)
[0..9]

Results

As you can see, we're starting to get into the more advanced use cases of GROQ. In the section below, we'll explore interesting advanced queries that will open your perception on what is possible.

#8 Advanced Queries

#8.1 Conditional values in projections

If you want to return different values of a property given some other(s) value of its document, you can use conditionals. They come in two flavors: select() and condition => { VALUES }.

Let's say we want to add an unicode emoji to the name of each species, depending on its rank: add a 🔁 if a variation (rank == "var") and a 🌿 if a species (rank == "species"). Here's how we'd express that in GROQ:

Query

*[_type == "species"][0..9]{
  // Inline conditional w/ select
  "name": select(
    rank == "var" => "🔁",
    rank == "species" => "🌿",
    "❓" // default value
  ) + " " + coalesce(
    common_name,
    scientific_name
  ),
  
  "wellDocumented": false,  
  // Block conditional -> syntactic sugar for select()
  count(distributions) > 0 &&
  defined(url_wikipedia_en) &&
  defined(url_usda) &&
  defined(author) &&
  defined(bibliography) => {
    // Overwrites the wellDocumented prop above
    "wellDocumented": true,
    "shouldRead": true,
  },
}
*[_type == "species"][0..9]{
  // Inline conditional w/ select
  "name": select(
    rank == "var" => "🔁",
    rank == "species" => "🌿",
    "❓" // default value
  ) + " " + coalesce(
    common_name,
    scientific_name
  ),
  
  "wellDocumented": false,  
  // Block conditional -> syntactic sugar for select()
  count(distributions) > 0 &&
  defined(url_wikipedia_en) &&
  defined(url_usda) &&
  defined(author) &&
  defined(bibliography) => {
    // Overwrites the wellDocumented prop above
    "wellDocumented": true,
    "shouldRead": true,
  },
}

Results

#8.2 Full text search with `score()` and `boost()`

What if we want to allow users to search for a specific species by their name? We can use the match operator to check if a given species' common_name matches the text we're looking for:

Query

*[
  _type == "species" &&
  common_name match "apple"
][0..9]{
  _id,
  common_name,
}
*[
  _type == "species" &&
  common_name match "apple"
][0..9]{
  _id,
  common_name,
}

Results

(Optional) match's caveats and details

You'll notice that in the query above we only get 2 results, not including the species with common_name "Pineapple guava, feijoa", even though it has "apple" in it. What is happening?

Match searches for words, meaning that by default it'll only returns results that include the full word you're looking for. In order to search for a given text in any part of the word we need to use wildcards (*) before and after it - telling GROQ to accept whatever character comes before or after "apple". It'd look like this: common_name match "*apple*". Try adding this in the query above 😉

This is useful if we want to only include names starting with a given string (match "pine*"), ending with one (match "*apple") or starting with a letter and ending with another (match "a*e").

It's also important to understand that match acts on the word level. If you want to provide full text search, you'll need to split the query users makes into words as well (from "pine apple" to ["*pine*", "*apple*"]). Simply doing common_name match ["*pine*", "*apple*"] won't work, however, as GROQ would only return true to this match if common_name had a specific word matching both pine and apple. "Pineapple guava, feijoa" would match, but "Wild custard-apple" wouldn't.

To surface both of these species in a "pine apple" search, we'd need OR conditionals:

*[
  _type == "species" &&
  (
    common_name match "*pine*" ||
    common_name match "*apple*"
  )
][0..9]{
  _id,
  common_name,
}

But what if we want to rank species by how well they match the given text query? We can use the score() function for that!

Query

*[_type == "species"]
// Add a _score to each species document based on how well they match "apple" in their common_name
| score(common_name match "*pine*")
// Order them by their _score
| order(_score desc)
{
  _score,
  common_name,
}
[0..9]
*[_type == "species"]
// Add a _score to each species document based on how well they match "apple" in their common_name
| score(common_name match "*pine*")
// Order them by their _score
| order(_score desc)
{
  _score,
  common_name,
}
[0..9]

Results

Exercise

Play around: try removing species with a score <= 0 from showing up in the query above.

What if we also want to match scientific_name? We could join common_name and scientific_name together and run match against that: (common_name + scientific_name) match "*pine*".

This works fine, but what if we want to assign a different weight to each property? The species' common name is much more relevant to the majority of users, so let's boost its importance.

Similarly to order(), the score() function allows you to pass multiple properties to check against. We can use that to match multiple conditions for scores and boost them as we see fit:

Query

*[_type == "species"]
// Score on multiple conditions:
| score(
  scientific_name match "*pine*",
  // Give common_name 5x more relevancy
  boost(common_name match "*pine*", 5)
)
| order(_score desc)
{
  _score,
  common_name,
  scientific_name,  
}
[0..9]
*[_type == "species"]
// Score on multiple conditions:
| score(
  scientific_name match "*pine*",
  // Give common_name 5x more relevancy
  boost(common_name match "*pine*", 5)
)
| order(_score desc)
{
  _score,
  common_name,
  scientific_name,  
}
[0..9]

Results

Try playing around with the score above, adding a boost to species with edible == true, running matches in the author property, etc.

#8.3 Complex filters

The filters we used so far are fairly plain, but what if we need to do more complex, fine-grained filtering? Let's start with combining && and || operators for an AND and OR combination:

Query

*[
  // Get every species
  _type == "species" &&
  (
    // That is either edible
    edible == true ||
    // or a vegetable
    vegetable == true
  )  
][0..9]{
  "name": coalesce(common_name, scientific_name),
  edible,
  vegetable,
}
*[
  // Get every species
  _type == "species" &&
  (
    // That is either edible
    edible == true ||
    // or a vegetable
    vegetable == true
  )  
][0..9]{
  "name": coalesce(common_name, scientific_name),
  edible,
  vegetable,
}

Results

See how we wrapped edible == true || vegetable == true in parenthesis? That ensures the parenthesis returns either true or false, which we can then match with _type == "species" to return only the species that are edible and/or vegetables.

Exercise

In the query above, try also ensuring that these edible/vegetable species have "Brazil South" in distributions (the in operator checks to see if a given value can be found in an array).

For more ideas on filters to play around with, refer to the filters cheat sheet.

Remember that in #6 references between documents we talked about nested queries inside any part of your GROQ query? Well, that also includes filters! Let's only include those species of which genus' name start with "a":

Query

*[
  _type == "species" &&
  (
    edible == true ||
    vegetable == true
  ) &&
  // Expand the genus reference,
  // Get its name
  // And ensure it starts with "a"
  genus->.name match "a*"
][0..9]{
  "name": coalesce(common_name, scientific_name),
  edible,
  vegetable,
  "genus": genus->.name,
}
*[
  _type == "species" &&
  (
    edible == true ||
    vegetable == true
  ) &&
  // Expand the genus reference,
  // Get its name
  // And ensure it starts with "a"
  genus->.name match "a*"
][0..9]{
  "name": coalesce(common_name, scientific_name),
  edible,
  vegetable,
  "genus": genus->.name,
}

Results

The query above has one issue, though: we're expanding the genus reference twice - once in the filter (genus->.name match "a*") and once in the projection ("genus": genus->.name). A slightly more performant and DRYer way to do it is by chaining filters, like we covered in section #7:

Query

*[
  _type == "species" &&
  (
    edible == true ||
    vegetable == true
  )
][0..9]{
  "name": coalesce(common_name, scientific_name),
  // Get the genus's name in the projection
  "genusName": genus->.name,
}[
  // 2nd filter using the genus name above
  genusName match "a*"
]
*[
  _type == "species" &&
  (
    edible == true ||
    vegetable == true
  )
][0..9]{
  "name": coalesce(common_name, scientific_name),
  // Get the genus's name in the projection
  "genusName": genus->.name,
}[
  // 2nd filter using the genus name above
  genusName match "a*"
]

Results

Putting it all together, here's an example of a filter on strings, booleans, arrays, references and numbers, all in one:

Query

*[
  _type == "species" &&
  (
    edible == true ||
    vegetable == true
  ) &&
  "Brazil South" in distributions &&
  defined(url_wikipedia_en) &&
  year > 1900
][0..9]{
  "name": coalesce(common_name, scientific_name),
  "genusName": genus->.name,
  distributions,
  url_wikipedia_en,
  year,
  edible,
  vegetable,
}[
  genusName match "a*"
]
*[
  _type == "species" &&
  (
    edible == true ||
    vegetable == true
  ) &&
  "Brazil South" in distributions &&
  defined(url_wikipedia_en) &&
  year > 1900
][0..9]{
  "name": coalesce(common_name, scientific_name),
  "genusName": genus->.name,
  distributions,
  url_wikipedia_en,
  year,
  edible,
  vegetable,
}[
  genusName match "a*"
]

Results

#8.4 Picking a single outstanding value from the dataset

What if we want to know the year of first species catalogued in this dataset? We could get all species, sort them by year in an ascending order (lowest numbers first) and then get the first species:

Query

*[_type == "species"]
// Order species from oldest to newest
| order(year asc)
// Get only the first species in this list
[0]
*[_type == "species"]
// Order species from oldest to newest
| order(year asc)
// Get only the first species in this list
[0]

Results

This will return the full object of the Ranunculus parnassifolius species, which we could then use to pick the earliest year itself. But we can go further with GROQ, picking exactly the year from the query above:

Query

*[_type == "species"]
| order(year asc)
// Get only the year from the first species in this order
[0].year
*[_type == "species"]
| order(year asc)
// Get only the year from the first species in this order
[0].year

Results

We can use this approach for all sorts of interesting inquiries:

Query

{

  "firstCataloguedYear": *[_type == "species"]| order(year asc)[0].year,
  
  "largestDistributionCount": *[
    _type == "species" && defined(distributions)
  ]|order(count(distributions) desc)
  [0]{
    "distributionCount": count(distributions)                 
  }.distributionCount,
  
  "longestGibberish": string(*[
    _type == "species"
  ]{
    "scientificLength": length(scientific_name)
  }
  |order(scientificLength desc)
  [0].scientificLength) + " characters",
  
  "mostBloomingFlower": *[
    _type == "species" && defined(bloom_months)
  ]
  | order(length(bloom_months) desc)
  [0].bloom_months,
  
  // Picking a single property also works in arrays:
  "allFlowerColors": *[
    _type == "species" && defined(flower_color)
  ].flower_color
}
{

  "firstCataloguedYear": *[_type == "species"]| order(year asc)[0].year,
  
  "largestDistributionCount": *[
    _type == "species" && defined(distributions)
  ]|order(count(distributions) desc)
  [0]{
    "distributionCount": count(distributions)                 
  }.distributionCount,
  
  "longestGibberish": string(*[
    _type == "species"
  ]{
    "scientificLength": length(scientific_name)
  }
  |order(scientificLength desc)
  [0].scientificLength) + " characters",
  
  "mostBloomingFlower": *[
    _type == "species" && defined(bloom_months)
  ]
  | order(length(bloom_months) desc)
  [0].bloom_months,
  
  // Picking a single property also works in arrays:
  "allFlowerColors": *[
    _type == "species" && defined(flower_color)
  ].flower_color
}

Results

Notice how in allFlowerColors above we also picked the value from an array, which returns an array of strings with all instances of a flower's color. There's currently no way to pick only unique/distinct values.

#8.5 Handling dates

This dataset of plants is not very time-bound - we only have a simple year property, with immutable values. In your work, however, you'll probably have articles with a publishDate that should be respected, events with multiple dates and times, etc.

To give you a north, consider that GROQ handles dates as ISO strings (ex: 2021-04-23T10:54:00Z), which we can then use to run equality operators on. For example, we could get all published articles with the filter *[_type == "article" && publishDate < now()], where now() is a GROQ function to get the current timestamp. It's also possible to order by dates, such as order(_updatedAt desc) for getting the documents most recently edited first - this is what the Sanity studio does under the hood to render your collections of documents in its structure builder.

If you need to convert a numeric value to a date, such as our year property, you can do that by converting the number to a string (string(number)) and adding the remaining parts of the ISO string. If it's just the year, such as 2000, we also need to include a date and time to its string (2000-01-01T00:00:00Z, assuming January 1st, at 0am). We can also use the datetime function to calculate the difference between now() and this year:

Query

{
  // Number being converted to an ISO-string
  // (using jan. 1st at 0am as the date & time)
  "2000-string": string(2000) + "-01-01T00:00:00Z",
  
  // Calculate how much time has passed since the year 2000 with the dateTime function
  "secondsSince2000": dateTime(now()) - dateTime(
    string(2000) + "-01-01T00:00:00Z"
  ),
}
// To avoid re-calculating secondsSince2000, let's run a second projection that already contains the secondsSince2000 variable
{
  ...,
  "daysSince2000": round(
    // divide seconds by 60 (1 minute) * 60 (1 hour) * 24 (1 day)
    secondsSince2000 / (60 * 60 * 24)
  )
}
{
  // Number being converted to an ISO-string
  // (using jan. 1st at 0am as the date & time)
  "2000-string": string(2000) + "-01-01T00:00:00Z",
  
  // Calculate how much time has passed since the year 2000 with the dateTime function
  "secondsSince2000": dateTime(now()) - dateTime(
    string(2000) + "-01-01T00:00:00Z"
  ),
}
// To avoid re-calculating secondsSince2000, let's run a second projection that already contains the secondsSince2000 variable
{
  ...,
  "daysSince2000": round(
    // divide seconds by 60 (1 minute) * 60 (1 hour) * 24 (1 day)
    secondsSince2000 / (60 * 60 * 24)
  )
}

Results

Keep up to date with GROQ & Sanity.io

If you've read this far, I imagine you're invested in GROQ. Why not keep up to date with my monthly newsletter on the best news, guides, tricks and inspiration on Sanity, then? 😬

#9 GROQ in the front-end

Knowing how the language works is only part of the equation - we also need to know how to display the data to users. I'll use examples of building a project with React, but it's worth to keep in mind that concepts here are applicable to any other stack, from Ruby on Rails to Eleventy.

#9.1 Fetching the data

After you write the query you want, you need to send it to Sanity's backend to get the actual data. This is done by communicating with its HTTP API in requests like so:

const PROJECT_ID = "tvkfaenh"
const DATASET = "production"
// More on API versions in the annex
const API_VERSION = "v2021-03-25"

fetch(`https://${PROJECT_ID}.apicdn.sanity.io/${API_VERSION}/data/query/${DATASET}?query=*%5B0%5D`, {
    "method": "GET"
});

Try opening your browser's DevTools in the network tab and running one of the queries of this guide - you'll see one of these requests being made.

Building these URLs is tedious, especially encoding the query to put it into the URL parameters (notice ?query=*%5B0%5D above). The alternative is to use one of the API clients (also called SDKs) to make it easier. Here's the same request above using @sanity/client, the official javascript implementation:

import sanityClient from '@sanity/client'

const client = sanityClient({
  projectId: "tvkfaenh",
  dataset: "production",
  apiVersion: "v2021-03-25",
  useCdn: true
})

// The actual query we wrote, without encoding 👇
client.fetch("*[0]")

The nice thing about the above is that you can create the client once and re-use it across your whole site/app without having to worry about encoding queries or getting URLs right.

#9.2 client-side vs. server-side

Where you fetch this data matters. If you're building with a traditional server framework like Laravel, you'll be connecting with Sanity only in the server-side (in Laravel's case, with the PHP API client). With the data in hand, you'll use it to build templates which will then be sent to users, end of story.

If you're using a client-side framework like Vue, Svelte, React or Angular, though, you also have the option to fetch the data directly in the client-side. In this approach, users connect directly with Sanity's API, then their browser does the heavy lifting of rendering templates. If you're doing client-side, make sure you read browser security and CORS.

This choice depends on your business strategy and tech stack. Personally, I tend to go with server-side data for public content as that's better for SEO, and client-side for content hidden behind a login screen.

#9.3 Variables in queries with parameters

You can specify variables in your GROQ queries whose values can be inserted outside of the query itself - these are called parameters. For example, if we want to pick a document with a specific _id, we can add *[_id == $id] and pass the actual value of the $id parameter as a second argument to client.fetch:

client.fetch("*[_id == $id]", { id: "injected-doc-id" })

This is useful when you want to re-use a single query multiple times for different inputs. Going back to section #4 on slices, we talked about pagination. When we're paginating, what we essentially want is a different subset of the data based on the current page we're in. Projects, filters and orders remain the same, only $pageNum changes. Here's a minimal React example of a paginated list of species (fetched in the client-side):

import React from "react";

// Don't worry if you aren't used to React, focus on fetchSpecies
const EdibleSpecies = () => {
  const [species, setSpecies] = React.useState();
  const [pageNum, setPageNum] = React.useState(1);

  React.useEffect(() => {
    // When the pageNum changes, let's re-fetch the data
    fetchSpecies(pageNum);
  }, [pageNum]);

  async function fetchSpecies(page) {
    const newSpecies = await client.fetch(
      // Notice how the query is static:
      `
      *[_type == "species" && edible == true]
      | order(year desc)
      [(($pageNum - 1) * 3)..($pageNum * 3)]
    `,
      {
        // The only thing we're changing is the pageNum param
        pageNum: page,
      }
    );
    // With the data, change the state of this component:
    setSpecies(newSpecies);
  }

  return (
    <ul>
      {species && species.map((entry) => <li>{entry.common_name}</li>)}
      <button onClick={() => setPageNum(pageNum + 1)}>Next page</button>
    </ul>
  );
};

The code above also shows how you can manipulate the parameters received with the tricks we learned above - in this case, we're setting the start of the slice at ($pageNum - 1) * 3 (which returns 0 if pageNum = 1) and setting the end at $pageNum * 3. At pageNum = 1, we'd then get [0..3], which returns the first 4 species of our query.

#9.4 Writing complex queries

Some queries can get really complex and hard to reason about. Thankfully, as they're plain strings, we can build them from atomic, re-usable parts.

Let's say we're building a homepage for our botany website which is composed by all sorts of data - the 10 yummiest families (as specified in section #7), most recently discovered species (section #5.2), the species found in the most places and the first catalogued year (section #8.4). Here's the query that will power this homepage:

{
  "recentlyDiscoveredSpecies": *[_type == "species"] {
    "title": coalesce(
      common_name,
      scientific_name,
      "Untitled species"
    ),
    image_url,
    distributions,
    url_wikipedia_en,
    year,
    genus-> {
      name
    },
    family-> {
      name
    },
  } | order(year desc)[0..4],

  "yummiestFamilies": *[_type == "family"] {
    _id,
    "title": coalesce(
      common_name,
      name,
      "Untitled family"
    ),
    "species": *[
      _type == "species" &&
      family._ref == ^._id
    ]{
      "title": coalesce(
        common_name,
        scientific_name,
        "Untitled species"
      ),
      image_url,
      edible,
      vegetable,
    },
  }
  {
    ...,
    "numberOfEdible": count(species[edible == true]),
    "numberOfVegetable": count(species[vegetable == true]),
  }
  {
    ..., 
    "yummyScore": (
      (numberOfEdible * 2) +
      (numberOfVegetable * 1)
    )
  }
  | order(yummyScore desc, count(species) desc)
  [0..9],

  "largestDistributionCount": *[
    _type == "species" && defined(distributions)
  ]|order(count(distributions) desc)
  [0]{
    "distributionCount": count(distributions)                 
  }.distributionCount,

  "firstCataloguedYear": *[_type == "species"]| order(year asc)[0].year,
}

Gnarly, right? And this isn't the longest query I've written! Let's break this query down using Javascript template strings:

// Re-usable property for every instance of species
const SPECIES_TITLE = `
"title": coalesce(
  common_name,
  scientific_name,
  "Untitled species"
),
`

// Used to get data for a species card
const SPECIES_PROJECTION = `{
  ${SPECIES_TITLE}
  image_url,
  distributions,
  url_wikipedia_en,
  year,
  genus-> {
    name
  },
  family-> {
    name
  },
}`

// Multi-step projection for getting a yummyScore of each family
const FAMILY_PROJECTION = `
{
  _id,
  "title": coalesce(
    common_name,
    name,
    "Untitled family"
  ),
  "species": *[
    _type == "species" &&
    family._ref == ^._id
  ]{
    ${SPECIES_TITLE}
    image_url,
    edible,
    vegetable,
  },
}
{
  ...,
  "numberOfEdible": count(species[edible == true]),
  "numberOfVegetable": count(species[vegetable == true]),
}
{
  ..., 
  "yummyScore": (
    (numberOfEdible * 2) +
    (numberOfVegetable * 1)
  )
}
`

// Pulling it all together
const finalQuery = /* groq */`{
  "recentlyDiscoveredSpecies": *[_type == "species"]
    ${SPECIES_PROJECTION}
  | order(year desc)[0..4],

  "yummiestFamilies": *[_type == "family"]
  ${FAMILY_PROJECTION}
  | order(yummyScore desc, count(species) desc)
  [0..9],

  "largestDistributionCount": *[
    _type == "species" && defined(distributions)
  ]|order(count(distributions) desc)
  [0]{
    "distributionCount": count(distributions)                 
  }.distributionCount,

  "firstCataloguedYear": *[_type == "species"]| order(year asc)[0].year,
}`

This has more lines of code, but it's significantly easier to parse. And this specific code can definitely be improved, let your imagination run wild!

As for a more interesting example of generating queries from code, we can think of a list of geni, families, species, authors and publications. Instead of manually writing the nested query for each, we could generate them with array.map:

const TYPES = ["genus", "family", "species", "author", "publication"]

const query = `{
  ${TYPES.map(type => `
    "${type}": *[_type == "${type}"]
    | order(coalesce(year, _createdAt) desc)
    [0..4]
    {
      "title": coalesce(
        common_name,
        name,
        scientific_name
      ),
      _id,
      _type == "species" => {
        image_url,
        year,
        family->,
      },
    }
  `).join(',\n')}
}`

Don't worry about this if your queries are simple and maintainable as-is, though. Let's not over-complicate our lives ;)

Conclusion

I hope this guide helped shine some light on the possibilities of GROQ. It's an extremely powerful language, and I can't wait to see what you build with it!

This is the accumulation of 3 years of my personal experience with GROQ - I focused on what I judge is the most impactful and useful aspects of the language. For that reason, this guide isn't 100% exhaustive, look for pointers and comments in the annex below for where to go next.

Finally, if something is unclear, feel free to reach me out at meet@hdoro.dev or hdorodev, or book a time for free 1-on-1 mentoring with me. I'm taking some time off client work to help you and others be more successful with Sanity & GROQ, it'd be a pleasure meeting you ;)

Annex

Updating data was intentionally left out

GraphQL popularized the notion of mutations, which some of you expect to do with GROQ.

As GROQ is a querying language, specialized in fetching data, it has no concept of updating, deleting or creating data. For that, you can rely on sending plain JSON objects to Sanity via the HTTP API, one of their API clients or the Sanity studio itself.

Features I didn't cover

Given the constraints of the database I chose for these examples and my lack of expertise in some features, I didn't cover everything GROQ can do. Here's a list of pointers:

GROQ has exceptional geolocation capabilities. I encourage you to take a look at the Sanity community interactive map for inspiration on what's possible with it.
Portable Text, Sanity's rich text format, is a central pillar of building experiences with GROQ. In the future I plan to write a bit more on it, but for now it's worth taking a look at the pt::text function.
Sanity has very robust real-time capabilities. I decided not to cover the here as I don't have much experience with it and it doesn't impact the actual GROQ you end up writing - after all, you can simply listen to your GROQ query.

Beware of API versions

In March 2021, the Sanity team introduced API versioning to their backend to be able to evolve GROQ as a language without breaking existing code. This guide uses the latest version available at the time of this writing, v2021-03-25.

New features and improvements are planned, which I hope to cover here as they are released. I'll also update the code to run queries in the latest versions to ensure it's always up-to-date with the language specification.

Keep in mind that, at the time of writing this (April 2021), groq-js isn't caught up to the spec yet.

Acknowledgments

A big thank you to the following people who provided valuable feedback and encouragement (alphabetic order):

Your feedback is highly appreciated!

This is my first attempt at a more comprehensive, interactive guide. Having your take on how it could be better would help tremendously - your future self will thank you when I release something more polished ;)

Learn GROQ in 45 minutes

Why learn GROQ?

How this guide works

#0 How GROQ looks like

#1 Get all documents in a dataset

#2 Filtering documents

#3 Ordering documents

#4 Limit and paginate results with slices

Want to dive deeper into Sanity.io?

#5 Get only the data you want with projections

#5.1 Creating new properties in projections

#5.2 Handling missing values

#6 References between documents

#7 Chaining & execution order

#8 Advanced Queries

#8.1 Conditional values in projections

#8.2 Full text search with score() and boost()

#8.3 Complex filters

#8.4 Picking a single outstanding value from the dataset

#8.5 Handling dates

Keep up to date with GROQ & Sanity.io

#9 GROQ in the front-end

#9.1 Fetching the data

#9.2 client-side vs. server-side

#9.3 Variables in queries with parameters

#9.4 Writing complex queries

Conclusion

Annex

Updating data was intentionally left out

Features I didn't cover

Beware of API versions

Acknowledgments

Your feedback is highly appreciated!

#8.2 Full text search with `score()` and `boost()`