Event sourcing is one of the most explicit ways of expressing the dimension of time with changing data. There are a lot of benefits to emphasize on changes instead of the current state of a database. It can, however, be very tricky to get it right; any time spent on this is not spent on building real business logic, and above all, I’m wondering why we are building this ourselves on a project-to-project basis.
Even when you’re using an event–sourcing framework, you’re putting something in between the actual application and the database: yet another layer adding complexity, yet another framework to learn about, and yet another thing that can interfere with other logic.
So I’m asking myself here, why is the database itself not taking care of the event sourcing we are looking for?
The thing is, it already does. It may not look like event sourcing from the outside, but every database keeps a log (sometimes called a journal) of every change that comes along. The reason is simple: it needs a log to make sure it can recover from a power failure. A database that only does so–called in-place updates can easily end up in an inconsistent state where it doesn’t know which block of data is already updated and which one is not. There are different ways of doing this (e.g., before and after images), but the essence of a database log is that it’s a consecutive trace of all the changes committed to the database. After a power failure, the log can be used to check recently changed blocks and do repairs where needed.
With the right design, future databases could very well deliver us the same features for which we need to implement event sourcing today. And more!
Let’s first list the presumed benefits of event sourcing:
- The possibility of tracing back is ideal for auditing and debugging a system.
- Registering a single truth makes it possible to rebuild the current data set when needed.
- If needed, it is possible to correct an older event and see the consequences by rebuilding the current data set.
- It is possible to keep different views of the same data.
- It makes it possible to see the intent or origin of a change.
- An append-only model is efficient and scalable.
Auditing and Debugging
The auditing and debugging benefits are obvious. Be aware, however, that before the term “event sourcing” even was invented, similar mechanisms were already implemented in applications all over the world, also because of audit–related requirements. A common way was to use database triggers together with additional history tables. In that respect, event sourcing is just another way of implementing this need. But now imagine a database that lets you query individual changes. The database log is already registering every data change. It is just a matter of making them available in a certain way.
This is not a new concept. The term “temporal database” was already coined decades ago. If you want the dive into the subject, consider reading Developing Time-Oriented Database Applications in SQL [Snodgrass, 1] or other works on the same subject. There are even databases that have built-in features with the purpose of tracing back changes, like Oracle Workspace Manager and PostgreSQL, just to name a few. So it clearly can be done. It is just a subject that needs more attention to future database software designers.
Rebuilding the Database
I always have a bit of a problem with the “single truth” and “rebuild” arguments of event sourcing. The fact that you can have discrepancies between the event log and the current data set is a problem only created by the introduction of event sourcing itself. And the possibility of correcting a historic event and then rebuilding the whole database sounds great, but I’m not sure it’s a good general practice. We are probably better of with a compensating action in most cases. You cannot recalculate any consequence of such a correction anyway when external systems are involved.
Others see the rebuild aspect as a chance to keep an image of current data in memory, resulting in memory-speed query performance. I do, however, think that that’s not a fundamental argument for event sourcing. Why invent the wheel yourself if a database could do this automatically? More and more databases have the feature of running in memory. And with the expected coming of nonvolatile memory like Intel’s 3D XPoint (Optane), it makes even more sense to have the database scientists figuring this out for us.
As mentioned, another argument for event sourcing is the possibility of having multiple views on the same data. But we should not forget that one of those views is most likely the representation of the factual data that we would normally regard as the normalized data model. It’s only because event sourcing is building this from individual changes that the current state of a system itself becomes a view in this regard. Our concern should only be with views that actually derive additional data. But that is not a new thing either. As Martin Fowler explained before in one of his talks [GOTO, 2] we always used to have views or separate reporting database for easier or better performing reports.
Taking all these into account, there is only one category of views left that justifies a rebuild:
- Views that we did not think of from the start. Otherwise, we could also have built it with data from the start.
- And secondly, it must be views in which individual changes take a role. Otherwise, you can also rebuild them from the current data set.
You cannot rebuild this type of view without having access to all the change events. However, in my vision of a future database in which you can query both the current data and all related changes, you can achieve the exact same goal.
Another benefit of event sourcing is that of registering the intent or trigger of a change. I think a database can solve that by having some metadata with every transaction. We already have database transactions as a way of bundling multiple changes. If we supply a transaction with the identity of the end user and maybe the ID of a screen that triggered the change, we have all the information we need.
Finally, let’s go into the benefits of the append-only model. The idea of not overwriting existing data is indeed very appealing. And it resonates very well in a world in which we talk about immutability and functional programming a lot. But did you know that many databases are already built this way? Traditionally, databases were indeed about memorizing a current state, with the log only being there for recovery. But more and more database implementations are based on an architecture in which the log itself is the database. If you’re interested, have a look at A performance evaluation of log-only temporal object database systems [WANG, 3]. It is just a simpler concept. In-place updating had benefits like clustering of data for better performance. But that had to do with relatively high seek times on hard disks. And that is a thing of the past.
A log-only database that keeps track of all changes implicitly can be regarded as an event–sourcing database. And it can actually do better than we can do with whatever framework we invent.
Given a schema-full database, it can interpret every detail of the changes it logs. So compared to a scan through a list of JSON-serialized events, it can cope with structured queries, like:
- What was the last change of a given entity, attribute, or relationship?
- Travel through time to see the database state at a certain historic moment in time, like we’re used with version control (e.g., in git).
- What are all the changes of a given user?
By the way, to know the actual (end) user who triggered a change, we just need to go back to a situation in which database users equal the actual users. That would make a lot of sense anyway. It is mostly for historical reasons that we don’t do that anymore. But this is whole different subject, and I will probably get back to this in another article.
I hope that future database-software developers keep this idea of query-able history in mind. It fits perfectly in the trend of log-only databases and synchronizing data with other database instances or other devices, and it will free us from inventing the wheel again and again.
1) Developing Time-Oriented Database Applications in SQL, Richard T. Snodgrass, Morgan Kaufmann Publishers, Inc., San Francisco, July, 1999, ISBN 1-55860-436-7.
2) The Many Meanings of Event-Driven Architecture, M. Fowler, 2017, GOTO Conference.
3) A performance evaluation of log-only temporal object database systems, X. Sean Wang, 2000, Proceedings 12th International Conference on Scientific and Statistical Database Management.