Java Persistence/Locking

Locking

edit

Locking is perhaps the most ignored persistence consideration. Most applications tend to ignore thinking about concurrency issues during development, and then smush in a locking mechanism before going into production. Considering the large percentage of software projects that fail or are canceled, or never achieve a large user base, perhaps this is logical. However, locking and concurrency is a critical or at least a very important issue for most applications, so probably should be something considered earlier in the development cycle.

If the application will have concurrent writers to the same objects, then a locking strategy is critical so that data corruption can be prevented. There are two strategies for preventing concurrent modification of the same object/row; optimistic and pessimistic locking. Technically there is a third strategy, ostrich locking, or no locking, which means put your head in the sand and ignore the issue.

There are various ways to implement both optimistic and pessimistic locking. JPA has support for version optimistic locking, but some JPA providers support other methods of optimistic locking, as well as pessimistic locking.

Locking and concurrency can be a confusing thing to consider, and there are a lot of misconcepts out there. Correctly implementing locking in your application typically involves more than setting some JPA or database configuration option (although that is all many applications that think they are using locking do). Locking may also involve application level changes, and ensuring other applications accessing the database also do so correctly for the locking policy being used.

Optimistic Locking

edit

Optimistic locking assumes that the data will not be modified between when you read the data until you write the data. This is the most common style of locking used and recommended in today's persistence solutions. The strategy involves checking that one or more values from the original object read are still the same when updating it. This verifies that the object has not changed by another user in between the read and the write.

JPA supports using an optimistic locking version field that gets updated on each update. The field can either be numeric or a timestamp value. A numeric value is recommended as a numeric value is more precise, portable, performant and easier to deal with than a timestamp.

The @Version annotation or <version> element is used to define the optimistic lock version field. The annotation is defined on the version field or property for the object, similar to an Id mapping. The object must contain an attribute to store the version field.

The object's version attribute is automatically updated by the JPA provider, and should not normally be modified by the application. The one exception is if the application reads the object in one transaction, sends the object to a client, and updates/merges the object in another transaction. In this case the application must ensure that the original object version is used, otherwise any changes in between the read and write will not be detected. The EntityManager merge() API will always merge the version, so the application is only responsible for this if manually merging.

When a locking contention is detected an OptimisticLockException will be thrown. This could be wrapped inside a RollbackException, or other exceptions if using JTA, but it should be set as the cause of the exception. The application can handle the exception, but should normally report the error to the user, and let them determine what to do.

Example of Version annotation

edit
@Entity
public abstract class Employee{
  @Id
  private long id;
  @Version
  private long version;
  ...
}

Example of Version XML

edit
<entity name="Employee" class="org.acme.Employee" access="FIELD">
    <attributes>
        <id name="id"/>
        <version name="version"/>
        ...
    </attributes>
<entity/>

Common Locking Mistakes, Questions and Problems

edit

Not sending version to client, only locking on the server

edit
Probably the most common mistake in locking in general is locking the wrong section of code. This is true no matter what form of locking is used, whether it be optimistic or pessimistic. The basic scenario is:
  1. User requests some data, the server reads the data from the database and sends it to the user in their client (doesn't matter if the client is html, rmi, web service).
  2. The user edits the data in the client.
  3. The user submits the data back to the server.
  4. The server begins a transaction, reads the object, merges the data and commits the transaction.
The issues is that the original data was read in step 1, but the lock was not obtained until step 4, so any changes made to the object in between steps 1 and 4 would not result in a conflict. This means there is little point to using any locking.
A key point is that when using database pessimistic locking or database transaction isolation, this will always be the case, the database locks will only occur in step 4, and any conflicts will not be detected. This is the main reason why using database locking does not scale to web applications, for the locking to be valid, the database transaction must be started at step 1 and not committed until step 4. This means that a live database connection and transaction must be held open while waiting for the web client, as well as locks, since there is no guarantee that the web client will not sit on the data for hours, go to lunch, or disappear of the face of the earth, holding database resources and locking data for all other users can be very undesirable.
For optimistic locking the solution is relatively simple, the object's version must be sent to the client along with the data (or kept in the http session). When the user submits the data back, the original version must be merged into the object read from the database, to ensure that any changes made between step 1 and 4 will be detected.

Handling optimistic lock exceptions

edit
Unfortunately programmers can frequently be too clever for their own good. The first issue that comes up when using optimistic locking is what to do when an OptimisticLockException occurs. The typical response of the friendly neighborhood super programmer, is to automatically handle the exception. They will just create a new transaction, refresh the object to reset its version, and merge the data back into the object and re-commit it. Presto problem solved, or is it?
This actually defeats the whole point of locking in the first place. If this is what you desire, you may as well use no locking. Unfortunately, the OptimisticLockException should rarely be automatically handled, and you really need to bother the user about the issue. You should report the conflict to the user, and either say "your sorry but an edit conflict occurred and they are going to have to redo their work", or in the best case, refresh the object and present the user with the current data and the data that they submitted and help them merge the two if appropriate.
Some automated merge tools will compare the two conflicting versions of the data and if none of the individual fields conflict, then the data will just be automatically merged without the user's aid. This is what most software version control systems do. Unfortunately the user is typically better able to decide when something is a conflict than the program, just because two versions of the .java file did not change the same line of code does not mean there was no conflict, the first user could have deleted a method that the other user added a method to reference, and several other possible issues that cause the typically nightly build to break every so often.

Paranoid Delusionment

edit
Locking can prevent most concurrency issues, but be careful of going overboard in over analyzing to death every possible hypothetical occurrence. Sometimes in an concurrent application (or any software application) bad stuff can happen. Users are pretty used to this by now, and I don't think anyone out there thinks computers are perfect.
A good example is a source code control system. Allowing users to overwrite each other changes is a bad thing; so most systems avoid this through versioning the source files. If a user submits changes to a file that originated from a older version than the current version, the source code control system will raise a conflict and make the user merge the two files. This is essentially optimistic locking. But what if one user removes or renames a method in one file, then another user adds a new method or call in another file to that old method? No source code control system that I know of will detect this issue, it is a conflict and will cause the build to break. The solution to this is to start locking or checking the lock on every file in the system (or at least every possible related file). Similar to using optimistic read locking on every possible related object, or pessimistically locking every possible related object. This could be done, but would probably be very expensive, and more importantly would now raise possible conflicts every time a user checked in, so would be entirely useless.
So, in general be careful of being too paranoid, such that you sacrifice the usability of your system.

Other applications accessing same data

edit
Any form of locking that is going to work requires that all applications accessing the same data follow the same rules. If you use optimistic locking in one application, but no locking in another accessing the same data, they will still conflict. One fake solution is to configure an update trigger to always increment the version value (unless incremented in the update). This will allow the new application to avoid overwriting the old application's changes, but the old application will still be able to overwrite the new application's changes. This still may be better than no locking at all, and perhaps the old application will eventually go away.
One common misconception is that if you use pessimistic locking, instead of adding a version field, you will be ok. Again pessimistic locking requires that all applications accessing the same data use the same form of locking. The old application can still read data (without locking), then update the data after the new application reads, locks, and updates the same data, overwriting its changes.

Isn't database transaction isolation all I need?

edit
Possibly, but most likely not. Most databases default to read committed transaction isolation. This means that you will never see uncommitted data, but this does not prevent concurrent transactions from overwriting the same data.
  1. Transaction A reads row x.
  2. Transaction B reads row x.
  3. Transaction A writes row x.
  4. Transaction B writes row x (and overwrites A's changes).
  5. Both commit successfully.
This is the case with read committed, but with serializable this conflict would not occur. With serializable either Transaction B would lock on the select for B and wait (perhaps a long time) until Transaction A commits. In some databases Transaction A may not wait, but would fail on commit. However, even with serializable isolation the typical web application would still have a conflict. This is because each server request operates in a different database transaction. The web client reads the data in one transaction, then updates it in another transaction. So optimistic locking is really the only viable locking option for the typical web application. Even if the read and write occurs in the same transaction, serializable is normally not the solution because of concurrency implications and deadlock potential.
See Serializable Transaction Isolation

What happens if I merge an object that was deleted by another user?

edit
What should happen is the merge should trigger an OptimisticLockException because the object has a version that is not null and greater than 0, and the object does not exist. But this is probably JPA provider specific, some may re-insert the object (this would occur without locking), or throw a different exception.
If you called persist instead of merge, then the object would be re-inserted.

What if my table doesn't have a version column?

edit
The best solution is probably just to add one. Field locking is another solution, as well as pessimistic locking in some cases.
See Field Locking

What about relationships?

edit
See Cascaded Locking

Can I use a timestamp?

edit
See Timestamp Locking

Do I need a version in each table for inheritance or multiple tables?

edit
The short answer is no, only in the root table.
See Multiple Versions

Advanced

edit

Timestamp Locking

edit

Timestamp version locking is supported by JPA and is configured the same as numeric version locking, except the attribute type will be a java.sql.Timestamp or other date/time type. Be cautious in using timestamp locking as timestamps have different levels of precision in different databases, and some database do not store a timestamp's milliseconds, or do not store them precisely. In general timestamp locking is less efficient than numeric version locking, so numeric version locking is recommended.

Timestamp locking is frequently used if the table already has a last updated timestamp column, and is also a convenient way to auto update a last updated column. The timestamp version value can be more useful than a numeric version, as it includes the relevant information on when the object was last updated.

The timestamp value in timestamp version locking can either come from the database, or from Java (mid-tier). JPA does not allow this to be configured, however some JPA providers may provide this option. Using the database's current timestamp can be very expensive, as it requires a database call to the server.

Multiple Versions

edit

An object can only have one version in JPA. Even if the object maps to multiple tables, only the primary table will have the version. If any fields in any of the tables changes, the version will be updated. If you desire multiple versions, you may need to map multiple version attributes in your object and manually maintain the duplicate versions, perhaps through events. Technically there is nothing preventing you from annotating multiple attributes with @Version, and potentially some JPA providers may support this.

Cascaded Locking

edit

Locking objects is different than locking rows in the database. An object can be more complex than a simple row; an object can span multiple tables, have inheritance, have relationships, and have dependent objects. So determining when an object has changed and needs to update its version can be more difficult than determining when a row has changed.

JPA does define that when any of the object's tables changes the version is updated. However it is less clear on relationships. If Basic, Embedded, or a foreign key relationship (OneToOne, ManyToOne) changes, the version will be updated. But what about OneToMany, ManyToMany, and a target foreign key OneToOne? For changes to these relationships the update to the version may depend on the JPA provider.

What about changes made to dependent objects? JPA does not have a cascade option for locking, and has no direct concept of dependent objects, so this is not an option. Some JPA providers may support this. One way to simulate this is to use write locking. JPA defines the EntityManager lock() API. You can define a version only in your root parent objects, and when a child (or relationship) is changed, you can call the lock API with the parent to cause a WRITE lock. This will cause the parent version to be updated. You may also be able to automate this through persistence events.

Usage of cascaded locking depends on your application. If in your application you consider one user updating one dependent part of an object, and another user updating another part of the object to be a locking contention, then this is what you want. If your application does not consider this to be a problem, then you do not want cascaded locking. One of the advantages of cascaded locking is you have fewer version fields to maintain, and only the update to the root object needs to check the version. This can make a difference in optimizations such as batch writing, as the dependent objects may not be able to be batched if they have their own version that must be checked.

TopLink / EclipseLink : Support cascaded locking through their @OptimisticLocking and @PrivateOwned annotations and XML.

Field Locking

edit

If you do not have a version field in your table, optimistic field locking is another solution. Field locking involves comparing certain fields in the object when updating. If those fields have changed, then the update will fail. JPA does not support field locking, but some JPA providers do support it.

Field locking can also be used when a finer level of locking is desired. For example if one user changes the object's name and another changes the objects address, you may desire for these updates to not conflict, and only desire optimistic lock errors when users change the same fields. You may also only be concerned about conflicts in changes to certain fields, and not desire lock errors from conflicts in the other fields.

Field locking can also be used on legacy schemas, where you cannot add a version column, or to integrate with other applications accessing the same data which are not using optimistic locking (note if the other applications are not also using field locking, you can only detect conflicts in one direction).

There are several types of field locking:

  • All fields compared in the update - This can lead to a very big where clause, but will detect any conflicts.
  • Selected fields compared in the update - This is useful if conflicts in only certain fields are desired.
  • Changed fields compared in the update - This is useful if only changes to the same fields are considered to be conflicts.

If your JPA provider does not support field locking, it is difficult to simulate, as it requires changes to the update SQL. Your JPA provider may allow overriding the update SQL, in which case, All or Selected field locking may be possible (if you have access to the original values), but Changed field locking is more difficult because the update must be dynamic. Another way to simulate field locking is to flush you changes, then refresh the object using a separate EntityManager and connection and compare the current values with your original object.

When using field locking it is important to keep the original object that was read. If you read the object in one transaction and send it to a client, then update in another, you are not really locking. Any changes made between the read and write will not be detected. You must keep the original object read managed in an EntityManager for your locking to have any effect.

TopLink / EclipseLink : Support field locking through their @OptimisticLocking annotation and XML.

Read and Write Locking

edit

It is sometimes desirable to lock something that you did not change. Normally this is done when making a change to one object, that is based on the state of another object, and you wish to ensure that the other object represents the current state of the database at the point of the commit. This is what serializable transaction isolation gives you, but optimistic read and write locking allow this requirement to be met declaratively and optimistically (and without deadlock, concurrency, and open transaction issues).

JPA supports read and write locks through the EntityManager.lock() API. The LockModeType argument can either be READ or WRITE. A READ lock will ensure that the state of the object does not change on commit. A WRITE lock will ensure that this transaction conflicts with any other transaction changing or locking the object. Essentially the READ lock checks the optimistic version field, and the WRITE checks and increments it.

Example of Using the Lock API

edit
Employee employee = entityManager.find(Employee.class, id);
employee.setSalary(employee.getManager().getSalary() / 2);
entityManager.lock(employee.getManager(), LockModeType.READ);

Write locking can also be used to provide object-level locks. If you desire for a change to a dependent object to conflict with any change to the parent object, or any other of its dependent objects, this can be done through write locking. This can also be used to lock relationships, when you change a OneToMany or ManyToMany relationship you can also force the parent's version to be incremented.

Example of Using the Lock API for Cascaded Locks

edit
Employee employee = entityManager.find(Employee.class, id);
employee.getAddress().setCity("Ottawa");
entityManager.lock(employee, LockModeType.WRITE);

No Locking a.k.a Ostrich Locking

edit

Conceptually people may scoff and be alarmed at the thought of no locking, but it is probably the most common form of locking in use. Some call it Ostrich locking as the strategy is to stick your head in the sand and ignore the issue. Most prototypes or small applications frequently do not have the requirement or in most cases the need for locking, and handling what to do when a locking contention does occur is beyond the scope of the application, so best to just ignore the issue.

In general it is probably best in JPA to enable optimistic locking always, as it is fairly simple to do, at least in concept, but what does occur on a conflict without any form of locking? Essentially it is last in wins, so if two users edit the same object, at the same time, the last one to commit will have their changes reflected in the database. This is true, at least for users editing the same fields, but if two users edit different fields in the same object, it depends on the JPA implementation. Some JPA providers only update exactly the fields that were changed, where as other update all fields in the object. So in one case the first user's changes would be overridden, but in the second they would not.

Pessimistic Locking

edit

Pessimistic locking means acquiring a lock on the object before you begin to edit the object, to ensure that no other users are editing the object. Pessimistic locking is typically implemented through using database row locks, such as through the SELECT ... FOR UPDATE SQL syntax. The data is read and locked, the changes are made and the transaction is committed, releasing the locks.

JPA 1.0 did not support pessimistic locking, but some JPA 1.0 providers do. JPA 2.0 supports pessimistic locking. It is also possible to use JPA native SQL queries to issue SELECT ... FOR UPDATE and use pessimistic locking. When using pessimistic locking you must ensure that the object is refreshed when it is locked, locking a potentially stale object is of no use. The SQL syntax for pessimistic locking is database specific, and different databases have different syntax and levels of support, so ensure your database properly supports your locking requirements.

EclipseLink (as of 1.2) : Supports JPA 2.0 pessimistic locking.
TopLink / EclipseLink : Support pessimistic locking through the "eclipselink.pessimistic-lock" query hint.

The main issues with pessimistic locking is they use database resources, so require a database transaction and connection to be held open for the duration of the edit. This is typically not desirable for interactive web applications. Pessimistic locking can also have concurrency issues and cause deadlocks. The main advantages of pessimistic locking is that once the lock is obtained, it is fairly certain that the edit will be successful. This can be desirable in highly concurrent applications, where optimistic locking may cause too many optimistic locking errors.

There are other ways to implement pessimistic locking, it could be implemented at the application level, or through serializable transaction isolation.

Application level pessimistic locking can be implemented through adding a locked field to your object. Before an edit you must update the field to locked (and commit the change). Then you can edit the object, and set the locked field back to false. To avoid conflicts in acquiring the lock, you should also use optimistic locking, to ensure the lock field is not updated to true by another user at the same time.

JPA 2.0 Locking

edit

JPA 2.0 adds support for pessimistic locking, as well as other locking options. A lock can be acquired using the EntityManager.lock() API, or passing a LockModeType to an EntityManager find() or refresh() operation, or setting the lockMode of a Query or NamedQuery.

The JPA 2.0 lock modes are defined in the LockModeType enum:

  • OPTIMISTIC (was READ in JPA 1.0) - The Entity will have its optimistic lock version checked on commit, to ensure no other transaction updated the object.
  • OPTIMISTIC_FORCE_INCREMENT (was WRITE in JPA 1.0) - The Entity will have its optimistic lock version incremented on commit, to ensure no other transaction updated (or READ locked) the object.
  • PESSIMISTIC_READ - The Entity is locked on the database, prevents any other transaction from acquiring a PESSIMISTIC_WRITE lock.
  • PESSIMISTIC_WRITE - The Entity is locked on the database, prevents any other transaction from acquiring a PESSIMISTIC_READ or PESSIMISTIC_WRITE lock.
  • PESSIMISTIC_FORCE_INCREMENT - The Entity is locked on the database, prevents any other transaction from acquiring a PESSIMISTIC_READ or PESSIMISTIC_WRITE lock, and the Entity will have its optimistic lock version incremented on commit. This is unusual as it does both an optimistic and pessimistic lock, normally an application would only use one locking model.
  • NONE - No lock is acquired, this is the default to any find, refresh or query operation.

JPA 2.0 also adds two new standard query hints. These can be passed to any Query, NamedQuery, or find(), lock() or refresh() operation.

  • "javax.persistence.lock.timeout" - Number of milliseconds to wait on the lock before giving up and throwing a PessimisticLockException.
  • "javax.persistence.lock.scope" - The valid scopes are defined in PessimisticLockScope, either NORMAL or EXTENDED. EXTENDED will also lock the object's owned join tables and element collection tables.

Serializable Transaction Isolation

edit

Serializable transaction isolation guarantees that anything read in the transaction will not be updated by any other user. Through using serializable transaction isolation and ensuring the data being edited is read in the same transaction, you can achieve pessimistic locking. It is important to ensure the objects are refreshed from the database in the transaction, as editing cached or potentially stale data defeats the point of locking.

Serializable transaction isolation can typically be enabled on the database, some databases even have this as the default. It can also be set on the JDBC Connection, or through native SQL, but this is database specific and different databases have different levels of support. The main issues with serializable transaction isolation are the same as using SELECT ... FOR UPDATE (see above for the gory details), in addition everything read is locked, so you cannot decide to only lock certain objects at certain times, but lock everything all the time. This can be a major concurrency issue for transactions with common read-only data, and can lead to deadlocks.

How database implement serializable transaction isolation differs between databases. Some databases (such as Oracle) can perform serializable transaction isolation in more of an optimistic sense, than the typically pessimistic implementation. Instead of each transaction requiring locks on all the data as it is read, the row versions are not checked until the transaction is committed, if any of the data changed an exception is thrown and the transaction is not allowed to commit.