When using Spring Data JPA, most developers are used to letting Spring handle almost all database operations. That’s especially the case for all update operations. Thanks to JPA’s entity mappings and the managed lifecycle of all entity objects, you only need to change an attribute of an entity object. Everything else happens automatically.
But having a good, automated solution for the most common use cases doesn’t mean it’s the ideal solution for all use cases. JPA and Spring Data JPA’s handling of update operations is a good example of that. The default handling is great if you only need to update a few entities. Your persistence provider automatically detects changes to all managed entity objects. For each changed object, it then executes an SQL UPDATE statement. Unfortunately, this is a very inefficient approach if you need to update a huge number of entities. It often causes the execution of several dozen or even hundreds of SQL UPDATE statements.
This is a general problem when using JPA and even a call of the saveAll method on the repository doesn’t avoid these statements. We can see that with the execution on the following code snippet:
In the log output, we can see that Spring Data JPA and Hibernate executed an SQL UPDATE statement for each entity object. If the database only contains a few records, this inefficient handling of the update operation will not cause a huge performance problem. But that drastically changes if you deploy this code to production and execute it using a much bigger database.
How to improve the handling of update operations
So, let’s say we have several update operations that affect quite a large part of the database (if it is a production database, that means that part is really HUGE). With the current approach, calling the saveAll() method, we will make as many requests as there are data rows in the database that we want to affect and thus there will be huge chaos in terms of execution time.
So how will we fix that? There are 2 options to improve the handling of update operations:
- Activate JDBC batching to execute the SQL UPDATE statements more efficiently or
- Provide your own update statement that performs the required changes in 1 step.
Both options are a valid solution to improve the performance of your write operations. Reducing the number of executed statements, of course, is the more efficient approach. But that’s not always possible and sometimes requires huge refactoring. In these cases, you should use JDBC batching to make the execution of these statements as efficient as possible.
1. Activate JDBC Batching
JDBC batching is a feature provided by the JDBC driver of your database. Your persistence provider, in most cases Hibernate, only makes it easier to use, and Spring Data JPA benefits from that. The general idea of JDBC batching is simple. Instead of executing one statement after the other, the JDBC driver groups multiple, identical statements with their bind parameter values into one batch and sends them to the database. This reduces the number of database requests. It also enables the database to process the statements more efficiently.
You can use JDBC batching for SQL INSERT, UPDATE and DELETE statements. JDBC batching is deactivated by default. You can activate it in your application.properties file by setting the property spring.jpa.properties.hibernate.jdbc.batch_size. This configures the maximum size of your JDBC batches.
And you should also set the property spring.jpa.properties.hibernate.order_updates to true. That tells Hibernate to order the statements before executing them. Ordering the statements ensures that Hibernate executes all identical update statements that only differ in the provided bind parameter values after each other. That’s important because the JDBC driver closes and executes the batch when you perform a different statement, e.g., an update on a different database table. By ordering the statements, you enable the JDBC driver to group them in a JDBC batch and use the configured batch size as efficiently as possible.
2.Provide your own update statement
If you can define an update statement that performs all the required changes, then it’s better to define a custom modifying query in your repository instead of activating JDBC batching. Using Spring Data JPA, you can do that by defining a method in your repository interface and annotating it with a @Query and a @Modifying annotation.
The @Query annotation enables you to define your own query. When working with JPA’s EntityManager, the required code to execute a modifying query slightly differs from the code that executes a SELECT statement. Because of that, you need to tell Spring Data JPA that you provided a modifying statement as the value of the @Query annotation. You can do that by annotating your repository method with a @Modifying annotation. Spring Data JPA then provides the required code to execute your INSERT, UPDATE or DELETE statement. The @Transactional annotation will revert all of the changes that have been made in the transaction if something fails.
JPA and Spring Data JPA’s detection and handling of changed entity objects make implementing your persistence layer very easy. But it can cause performance issues if one of your use cases needs to update a huge number of database records. In that case, generating and updating an SQL UPDATE statement for each record will slow down your application. You can reduce the performance impact of this inefficiency by activating JDBC batching. This groups multiple, identical statements and bind parameter values into a batch and sends them to the database. Or, if you can express all update operations in 1 statement, you can avoid the performance problem entirely by implementing a modifying query.
Author: Andrea Galevska, Software Engineer at Keitaro