In short, an N+1 query problem occurs when for each row returned in an initial query (to a SQL database for example) your application needs to perform a second query to retrieve additional data from another table. Let's see what its implications are and how to avoid this problem...

Introduction

Let's think about a hypothetical ecommerce application, where you need to list all customers and their respective orders. A naive implementation (typically with an ORM) could look like this:

SELECT [columns...] FROM "customers"

For now we only have one query (this query is the 1 of N+1) that returns the Customers and so to obtain the order data, N additional queries will be necessary, with N being the number of items (Customers) returned in the first query.

-- For each customer
SELECT [columns...] FROM "orders" WHERE "orders"."customer_id" = 1
SELECT [columns...] FROM "orders" WHERE "orders"."customer_id" = 2
SELECT [columns...] FROM "orders" WHERE "orders"."customer_id" = 3
-- ... and so on ...
  • If the first query returns 100 customers, we will have 101 queries in total
  • If the first query returns 1000 customers, we will have 1001 queries in total And so on...

But of course it can always get worse, if for example we need to include the Items/Products of each order in the results...

ORM: Lazy vs Eager

A common cause for this problem is the fact that many ORMs use by default an approach called LAZY, where the query returns only the directly queried entity, without including related entities.

In the Java world, for example, the JPA specification defines these defaults:

Relationship TypeDefault Loading
@OneToOneEAGER
@ManyToOneEAGER
@OneToManyLAZY
@ManyToManyLAZY

However, you can change this behavior, making it adopt the strategy called EAGER, in this approach the generated SQL query can include a JOIN loading the additional data.
Example of Java code:

// ... imports
@Entity
public class Customer {

	@OneToMany(fetch = FetchType.EAGER)
	private List<Order> orders;
	
    //... rest of the code
}

In our Customers and Orders example, an EAGER ORM setting could result in a single query similar to this:

SELECT [columns...] FROM "customers" as c
   LEFT JOIN "orders" ON "orders"."costumer_id" = c.id

Yes, this would return the data we need in a single query. However, it is necessary to be careful with this approach, as it can be very expensive.
Configuring the ORM to have this EAGER approach can cause it to always return the complete data by performing joins, even in other parts of the application where we do not need it.
In this case, it is preferable to check whether the ORM in question has mechanisms that allow us to enable/disable this EAGER behavior on demand in each query. Still using Java/JPA as an example, it would be possible (without adding @OneToMany(fetch = FetchType.EAGER) in the entity class) to create a custom query method in the Repository, in this custom query you could use JPQL with the FETCH option to add the EAGER behavior only in this specific query, without affecting the others:

// imports
public interface CustomerRepository extends Repository<Customer, Long> {

    @Query("select c FROM Customer c JOIN FETCH c.orders")
    List<Customer> retrieveAllCustomerWithOrdersEager();
	
}

A simple approach (leaving the ORM universe a little)

A perhaps simpler approach that will solve this problem very well is to perform two queries as follows:

-- We still have our initial query that retrieves the Customers
SELECT [columns...] FROM "customers"
-- And then we perform a second query, passing the IDs obtained in the first,
-- which will return all related orders
SELECT [columns...] FROM "orders"
WHERE "orders"."customer_id" IN ( 1, 2, 3 /* ... each customer_id */)

With this approach, you will have in memory after the queries are returned, two lists, one of Customers and the other of Orders. You will probably need to perform some additional actions such as a "groupBy" that groups the Orders by Customer.id. This will depend a lot on what you intend to do with the data.

API Queries

You may already be thinking that this could also happen with APIs. Yes, indeed, although this post has focused more on database queries. The same behavior could occur if your application queries API data with a poorly optimized model. There are also strategies to solve this type of problem in API queries, but this topic will be discussed in another article.

In conclusion

There are many advanced ways of optimization, but before thinking about anything more extreme, it is essential to check if we are not falling into primary errors such as N+1 queries.

Here I focused on relational database queries, but it is worth considering that the same problems can apply to distributed API scenarios, for example, such as microservices where we need to call secondary APIs to "enrich" the data in our return object.


That's all folks...

Artus