In short, an N+1 query problem occurs when for each row returned in an initial query (to a sql database for example) your application needs to perform a second query to retrieve additional data from another table. Let's see what its implications are and how to avoid this problem...


Let's think about a hypothetical ecommerce application, where you need to list all customers and their respective orders. A naive implementation (typically with an ORM) could look like this:

SELECT [colunas...] FROM customers

For now we only have one query (this query is the 1 of N+1) that returns the Customers and so to get the order data, N additional queries will be needed, where N is the number of items (Customers) returned in the first query. ```sql -- For each customer SELECT [columns...] FROM "orders" WHERE "orders"."customer_id" = 1 SELECT [columns...] FROM "orders" WHERE "orders"."customer_id" = 2 SELECT [columns...] FROM "orders" WHERE "orders"."customer_id" = 3 -- ... and so on...

- If the first query returns 100 customers, we will have 101 queries in total
- If the first query returns 1000 customers, we will have 1001 queries in total
And so on...

But of course it can always get worse, if for example we need to include the Items/Products of each order in the results...

## ORM: Lazy vs Eager

Most ORMs use by default an approach called *lazy*, where the query returns only the entity directly queried, without including related entities.
However, most of these ORMs will also allow the configuration of an *eager* approach, in which the query can include a **JOIN**.
In our Customers and Orders example, an *eager* configuration of the ORM could result in a single query similar to this:
```sql
SELECT [columns...] FROM "customers" as c
LEFT JOIN "orders" ON "orders"."customer_id" = c.id

Yes, this would return the data we need in a single query. However, it is necessary to be careful with this approach, as it can be very expensive. Configuring the ORM to have this eager approach can cause it to always return the complete data by performing joins, even in other parts of the application where we do not need it. In this case, it is preferable to check whether the ORM in question has mechanisms that allow us to activate/deactivate this eager behavior on demand in each query. For example, in Java with JPA using jpql we have the FETCH option:

-- JPQL
SELECT [columns...] FROM "customers" JOIN FETCH "customers"."orders"

A simpler approach

Perhaps, a simpler approach that will solve this problem very well is to perform two queries as follows:

-- We still have our initial query that retrieves the Customers
SELECT [columns...] FROM "customers"
-- And then we perform a second query, passing the IDs obtained in the first,
-- which will return all related orders
SELECT [columns...] FROM "orders"
WHERE "orders"."customer_id" IN ( [ each customer_id ... ] )

Concluding

There are many advanced ways to optimize, but before thinking about anything more extreme, it is essential to check if we are not falling into primary errors such as N+1 queries.

Here I have focused on relational database queries, but it is worth considering that the same problems can apply to distributed API scenarios, for example, such as microservices where we need to call secondary APIs to "enrich" the data in our return object.


That's all folks...

Artus