Transactions in RavenDB

Before I get into the details regarding RavenDB, I’ll start with a quick overview of what ACID is.

ACID

ACID is a set of four “rules” that let us know that transactions are going to behave in the way we expect transactions should behave. The rules are that the transactions must be Atomic, Consistent, Isolated, and Durable. But anyone can tell you that. Here’s a quick break-down of what that actually means to us:

  • Atomic: “all or nothing”. If a transaction represents multiple changes, all must happen or none can happen. A common example of this is in a banking scenario where I write you a check for $10. The transaction involved includes two steps:
    1. Subtract $10 from my account
    2. Add $10 to your account.

    If my account does not have $10 in it, then $10 should not be added to your account. We’ll fail at the first step and stop. The other side of this is that if $10 cannot be added to your account (maybe your account was closed by the time the check was processed), then $10 should not be taken out of my account.

  • Consistent: you will usually hear this defined as “a transaction being committed should leave the database in a valid state.” In addition to not allowing transactions to violate constraints, cascades, or triggers, consistency also absolutely determines what others see if they try to read data during or shortly after the transaction’s life. This is also where RavenDB most widely differs from what you’re used to in the SQL world.
  • Isolated: when multiple transactions are executed at the same time, the end results will be the exact same as they would be if they were executed sequentially. This comes into play when two transactions are acting on the same piece of data. The best explanation of this is on Wikipedia, here: http://en.wikipedia.org/wiki/ACID#Isolation_failure.
  • Durable: once the database lets me know that something is written, I should be 100% confident that irregardless of any hardware or other failure, my data will still be there. The exception to this is losing the entire database and having to restore from a backup, but that doesn’t really count.

What does this look like in RavenDB?

RavenDB supports both implicit and explicit transactions. By implicit, I mean that in normal usage of RavenDB you will already have transactions built in. For example, take a look at this code that transfers $10 from one account to another.

[csharp]
public void TransferMoney(string fromAccountNumber, string toAccountNumber, decimal amount) {
using(var session = Store.OpenSession()) {
session.Advanced.UseOptimisticConcurrency = true;
var fromAccount = session.Load<Account>(“Accounts/” + fromAccountNumber);
var toAccount = session.Load<Account>(“Accounts/” + toAccountNumber);
fromAccount.Balance -= amount;
toAccount.Balance += amount;
session.Store(fromAccount);
session.Store(toAccount);
session.SaveChanges();
}
}
[/csharp]

Here’s what happens line by line:

2: I use the document store to open a session object.
3: I tell the session to use Optimistic Concurrency. This causes RavenDB to to do some internal checking and verify that the accounts were not modified in the time between us loading them and calling SaveChanges(). **Note: at the time of this writing (RavenDB build 960), this behavior occurs regardless of the setting of the UseOptimisticConcurrency property. I’m not sure if this is intentional or if it is an unintentional side-effect to a change made to UseOptimisticConcurrency between builds 888 and 960.
4-5: Fetch the two account objects that we are using.
6: Subtract the amount from the first account’s Balance.
7: Add the amount to the second account’s Balance.
8-9: Store the changes to both accounts into the session object.
10: Call SaveChanges() on the session object. Until this is called, all changes are held in memory by the session object and not actually sent to the server. Calling this is what actually sends the transaction to the server.

The session object represents a single atomic transaction that can be isolated and made durable. Consistency however, depends on how (and when) you plan on retrieving the committed data.

Isolation

The atomic and durable parts of a RavenDB session are easy to understand, but consistency and isolation are where things can get a little more confusing. Let’s begin with Isolation.

Isolation itself is automatically taken care of by RavenDB, but I think the topic of isolation merits more detailed explanation. Here is what you can expect from your transactions in Raven:

  1. Any changes your session makes won’t be visible to any other session until you successfully call SaveChanges().
  2. If two transactions occur simultaneously, none of the changes from one can spill into the other.

Consistency ..eventually?

Consistency is where a lot of the misconceptions of RavenDB come into play, mainly because it’s handled so differently than it is in the SQL world. Using SQL, you are probably used to something called immediate consistency (and probably not even aware that something else exists). Immediate consistency is a consistency model in which an update to the database is immediately visible to all readers of that data once it is reported to the writer that the update has been successful. In RavenDB and NoSQL in general, we typically see something called eventual consistency.

Eventual consistency in RavenDB means that if I make a change to a document and save it, a query on any changed properties may return “stale” data. I know you’re probably thinking “Whaaaat!?”, but bear with me on this one. As an example, let’s say that we need to make some changes to a product on our ecommerce site. The product has a name, a category, and a price. Let’s say I need to change a product’s category and price. Here is my current product:

[csharp]
{
 Name: “Microwave”,
 Category: “Home Goods”,
 Price: “99.95”
}
[/csharp]

And this is my new product, in the more appropriate category and also now on sale:

[csharp]
{
 Name: “Microwave”,
 Category: “Appliances”,
 Price: “39.95”
}
[/csharp]

Immediately after I make the changes, anyone visiting the Appliances category may not see the Microwave product. However, anyone going to the Microwave product page (either from a bookmark or from the Home Goods category) will see the new price and category. The reason for this is the differentiation between querying an index and loading a document. When we visit the categories page, we are querying against an index. When we visit the product’s page, we are loading the product directly from the database by it’s ID.

[csharp]
public List<Product> ProductsInCategory(string category)
{
using (var session = store.OpenSession())
{
return session.Query<Product>().Where(p => p.Category == category).ToList();
}
}
public Product LoadProduct(string productId)
{
using (var session = store.OpenSession())
{
return session.Load<Product>(productId);
}
}
[/csharp]

What we see here is that when we query we experience eventual consistency, but when we load, we experience immediate consistency. The challenge is that we can only load by the ID of an object, so to take advantage of that immediate consistency we must design our applications with that in mind.

Conflicts

As I demonstrated above, the issue of conflicts appears to be an easy problem to solve using the UseOptimisticConcurrency property. And in such a simple example, it is. Things get a little more complicated when your application cannot be designed in such a way to allow the loading, mutating, and saving of an object all inside of one session (and no, you don’t want to keep a session around for the lifetime of the object.. trust me on this one). Consider the scenario where you have a website with an edit form. When the user hits the edit form, the object they are editing is loaded from the database, the session is disposed, and the object is sent back to the client. It may be a while until we get that object back for saving, which means that we could have a greater likelihood of that object being changed elsewhere by the time we get it back.

You may assume that UseOptimisticConcurrency would solve that problem for us, but hold it right there. It doesn’t; not without a hint from us anyway. Behind the scenes, RavenDB uses something called an Etag. An Etag is like a version number. Any time a change is made to the database, the global Etag increments. Any documents that are changed have their Etag set to the global Etag (before the increment). So for simplicity, let’s say that when a user loads a document, it’s Etag is 2. Before they are done making changes, another user makes an edit and now the document’s Etag is 6. (See diagram)

For RavenDB to help us solve this problem, we need to keep track of the Etag ourselves across RavenDB sessions. Here is what that code would look like:

[csharp]
public Account LoadAccount(string accountNumber)
{
var account = session.Load<Account>(“Accounts/” + accountNumber);
account.Etag = session.Advanced.GetEtagFor(account);
return account;
}
public void SaveAccount(Account account)
{
session.Advanced.UseOptimisticConcurrency = true;
session.Store(account, account.Etag, “Accounts/” + account.AccountNumber);
session.SaveChanges();
}
[/csharp]

The reason we need to manage this ourselves in this scenario is because normally the session internally stores the Etag for each object it gets from the database, but here we dispose the session and the Etags it holds. The next session has never seen your object before and therefore doesn’t know what Etag it had when it was loaded (or if it was even loaded at all).

Explicit Transactions

Lastly, a post about transactions should at least include something about the more traditional approach to transactions. While not all that useful in RavenDB, sometimes you will want your transactions to cross session boundaries. Doing this with RavenDB is extremely easy, you just need to wrap your multiple session.SaveChanges() calls inside of a Transaction from the System.Transactions namespace. Here’s a quick bit of code showing how you might use a transaction to treat two sessions as one transaction.

[csharp]
using (var transaction = new TransactionScope())
{
using (var session1 = store.OpenSession())
{
session1.Store(new Account());
session1.SaveChanges();
}
using (var session2 = store.OpenSession())
{
session2.Store(new Account());
session2.SaveChanges();
}
transaction.Complete();
}
[/csharp]

One very important thing to take note of however is that transaction.Complete() followed by the implicit Dispose() at the end of the using block is a non-blocking asynchronous call. The commit actually happens on a background thread. This means that if you use a transaction this way and immediately try to read that data, you will get stale data. You can read more about it on the RavenDB site here: http://ravendb.net/docs/faq/working-with-dtc.

Leave a Reply