Archive for December, 2006

Object Oriented Databases (OODBMS)

Wednesday, December 20th, 2006

I used to be something of an expert on Object Oriented Database Systems, how to use them, etc. The way you get to be an expert at a technology is to simply get ahead of the adoption curve and spend a bunch of time figuring out how to make the technology work before it becomes common knowledge. At this point, you can charge premium rates as you have scarce knowledge in your posession.

I got to this position by working for some adventurous folks that were willing to take a chance on the hype. The first one I came across was ObjectStore with C++. The project didn’t launch, but we built lots of prototypes and I got a good indoctrination into the ups and downs of OODBMS lore. Later on I was exposed to some others, Versant, Poet, and some lesser known ones. Like relational databases, once you get the hang of one, the rest are pretty similar. The key concepts are:

1) OODBS allow you to create one or more named “roots”. A root is basically a variable – you ask for the object at root “foo” and get it back. Some only give you one root. If you only get one root, then almost always you just stick a hash/map/dictionary at the root and pretend you have several anyhow. The root is your entry point to the data.

2) All object manipulations/accesses must be done within a transaction context. So you end up digging through your app looking for sensible transaction boundaries. For a web app, you typically begin a transaction at the beginning of a request and commit it just before sending the response. You want to keep transactions short so as not to have other users waiting on locks.

3) Objects become part of the database via “reachability”. The OODBMS will “trace” your object graph starting at the root upon commit, calculate changes to the graph, and then write the changes to the database. Any new objects reachable from the root object automatically becomes part of the database. While this might sound expensive, it generally is quite cheap.

So you generally open a transaction, lookup an object from a root, navigate to the object of interest, make changes, and then commit the transaction. Many also let you hang onto an object reference across transactions. The object reference can only be accessed within a transaction – trying to read data from it outside of a transaction will fail with an exception. This makes redrawing user interfaces problematic.

OODBs come from the CAD world where you have a network of a zillion objects, all slightly different, where mapping them to a regular container like a db table would be really expensive. They’re really good at this object persistence game.

OODBs are seductive. They are easy to get started with. For one thing, you don’t have to do a data model, just your object model. Your code is your model. You make objects, stick them in containers, and forget about them. Sounds great, right?

But as anyone who has lived with an OODBMS for any period of time knows, Object databases are great, until they’re not, and then they truly suck. Here’s why:

1) Concurrency is very poor. As I mentioned, OODBs come from the CAD world and work well for storing complex cad models. But CAD models are seldom updated concurrently by large numbers of people. As you modify objects within a transaction, the OODB has to obtain locks on your modified objects to guarantee consistency. Unfortunately, none of them (that I know of) implement object level locking. Most implement locking at the memory page level. Spurious lock conflicts where two unrelated objects share a memory page can be common. Resolving these conflicts can be expensive. Because, all work must happen withing a transaction, transactions tend to be on the long side.

2) Constant re-fetching of data every transaction makes keeping user interface elements up to date very expensive. There is no user level in-memory caching without writing user level code to create transient copies.

3) Schema migration is hard, if not impossible. Your object defines your format. Adding a field to a class makes your in-memory model inconsistent with the slabs of bits you wrote out before you added the field. There are ways around this. The usual one is to have one ivar that is a dictionary. Otherwise, there are usually some very user un-friendly scripts that have to be run. In many cases, the database must be taken offline to do this. So much for your three nines availability.

4) Death by a trillion bug fixes. I can’t speak for all, but ObjectStore would require the database be taken offline and an update script be run for every upgrade. For a site that is supposed to be up all the time, this isn’t acceptable. So upgrades were deferred. When we did this, we found that

5) OODBMS providers have limited resources and will only support versions up to one year old. If you get too far out of date and your db goes down, you are flat out of luck. The support people won’t help you. Only a really large organization could afford to keep up with all the little point fix releases ObjectStore made in a year – we couldn’t afford the man power or the down time.

6) Bugs are forever. If you put a bug into your program that damages the object model, it becomes enshrined in the database. Subsequent read code that finds the malformed chunk of the object model will usually fail. Subtle corruptions build up over time making a full database walk harder and harder to complete over time. Conventional databases can avoid this by implementing appropriate constraints.

7) No security. Any screwball developer can destroy your reference data (usually stored in ordered collections off of the root). A conventional relational database can safeguard important data with roles, permissions, and constraints.

8) Garbage Collection is not universally available. Orphaned junk is common. Some OODBs provide GC utilities, however they can fail if there is corrupt data (see items 6 and 7).

9) No ad hoc query capability. You have to write a new program to view any data at all. You need to write programs to update reference data. You need a program to do anything at all with your data. No fixing problems with a quick line of SQL. Searching for unanticipated patterns is difficult.

I’ve been bitten by all of these issues at one time or another and have recently inherited an application written using a Smalltalk OODB called OmniBase. Debugging this application is extremely painful because launching a debugger results in the transaction being terminated and all object references becoming invalid. Thus, the data that might provide a clue as to the source of the error is gone. Additionally, while the author claims to provide support, he simply collects fees and then tells you that your application doesn’t run in his environment, blames you for writing rotten code, and declines future contact.

So this dog has to go.

Fortunately, you can get most of the benefits of an OODB without the drawbacks by using an Object Relational Mapping framework. I’ve selected GLORP, an open source mapping framework that is improving all the time, and found that I can implement support the part of OmniBase’s API with very little change to the user interface, which is written in Seaside under Squeak.

Next time, I’ll talk a little bit about how this works.

Well, that was exciting

Wednesday, December 20th, 2006

In case you haven’t been watching, the Pacific NW got hammered last week with a big storm. Trees were knocked down, taking power and cable lines with them and blocking roads. We lost power for a day, internet for a week, and the main entrance to our neighborhood was blocked with fallen power cables and trees hanging dangerously over the roadway from the wreckage of the power wires.

To make things extra fun, a pump failed and my lot began to flood. I rented a portble to get the water level low enough so I could work on them and installed a new one on Monday. Good thing, as it is raining hard again today.

Today, the internet came back on and they cleared the road. For me, things are back to normal, but a week without internet was kind of a drag.