Archive for January, 2007

Database Transplant Successful

Wednesday, January 31st, 2007

OmniBase is dead.

I killed it.

The replacement with GLORP has been successful. Rate of error discovery has stabilized and, after about 3 weeks of production experience and many small fixes, is now below OmniBase’s on its best day.

A few things I’ve learned along the way:

1) Put limits on all search queries. All of them. Too many times in the last couple weeks the system became unresponsive for a couple minutes as it fetched every blessed row in the database. If you’re getting that much stuff, you’re not gonna find it. I’ve limited all searches to 20 rows.

2) Use plain old objects wherever possible. OODBs have a history of being rotten at schema migration. So all objects used dictionaries for ivar storage. This makes it easy to add or remove ivars without getting messed up in the OODB as the object’s ‘format’ is always the same. However, all that hashing and fetching, and searching, isn’t free. I didn’t realize how expensive it was until I ripped it out and replaced them with vanilla ivars. Zooooom!

Because the mapping is from the objects to the RDBMS through GLORP on an attribute level rather than relying on blitting a whole object as a chunk, the format of the objects is decoupled from the database representation. Changes in the object’s binary format has no bearing on the database representation.

3) When generating accessors – add lazy initialization where nil is not an acceptable value.

4) Keep it meta. The meta model is the most important thing we have. With a good meta model all else can be rapidly replaced. Consequently, I think the next step in evolution – conversion of the meta model to use Magritte and then replacement of the UI to use Magritte as well, will radically improve agility.

Meanwhile, to prepare, I’m updating the version of Seaside that the application is built on. This is turning out to be harder than I thought as Seaside has moved on quite a bit and many classes in my old version have been simply replaced in the newer one.

The really great thing about using meta facilities like glorp and magritte, is that the application continues to shrink until it is mostly just the meta model.

Less code, less bugs.

Tell Congress: Support the Genetic Information Nondiscrimination Act (GINA)

Friday, January 19th, 2007

You can be completely healthy and still denied health or life insurance based on your DNA! As scientists identify genetic causes of diseases, insurance companies are using that information to deny coverage to Americans. Grandpa had Alzheimer’s? You might carry the gene – and be turned down for coverage! GINA has hit the House. Show your support!


I have just been turned down for coverage at Regence Blue Shield because of genetic testing that showed I have hereditary hemochromatosis. The diagnosis was early, no damage has occurred, maintenance regime of monthly phlebotomies will keep my iron levels normal, and they still won’t write me a policy.


Thanks for writing to your US Representative and asking him to support GINA.


read more | digg story

Records and Objects

Tuesday, January 2nd, 2007

I’ve been slow in posting this because I’ve been trying to make a deadline and there hasn’t been time to do anything else. But at the moment, the data migration scripts are running and I’m just babysitting a pair of computers watching for problems, tweaking the code, and restarting the migration.

And it is taking forever. The old OmniBase db is about 880M, a fair amount of it is garbage. A hot backup script written by OmniBase’s author takes over 2 hours to run. It doesn’t reinflate the objects either. This is just not scaling. At any given time, I figure this application hosts perhaps 20 users tops. On those days, it crawls.

The users have been wanting extracts for generating annual reports for the last 4 months. I have written numerous scripts to try to generate this for them. They always fail. Often, after a few hundred reads, OmniBase will get itself into a state with some hanging locks and will need to be killed. A script that tries to touch every object in the database takes well over 30 hours to run. Why? I’m not sure. Also, users are asking for reports that involve complex traversals that take too long. Not enough entry points into the data. There has to be a better way.

Enter PostgreSQL. The best and most free database of the free solutions. Postgres rocks. It just works. The tools are nice. The data type selection is extensive. It is easy to extend with additional scripting languages. The most magical (to me) utility is pg_dump. This little gem can back up an entire fully populated postgres database with equivalent data, hot while the app is running, in about 10 seconds. The resulting file is around 140M. I’m sold.

Except that, because of the ‘magic’ nature of OODBs, this application is written (mostly) as if all objects are just in memory – so converting to a SQL oriented format just wasn’t going to be feasible.

So I adopted GLORP and set out to emulate the existing database api. This has been successsful. Wildly so.
Glorp uses a meta model – actually two of them – one for the databse, one for the objects, plus a mapping model in between. The whole bundle is contained in a class called DescriptorSystem. DescriptorSystem is abstract and uses the Template Method Pattern to help you build out the models. You subclass DescriptorSystem and then override methods like allClassNames, allTableNames and so forth. The initialize method will call these and then start iterating over them calling methods with names derived from items in the list. So if you have a class Login mapped to a Table LOGIN, you need to add ‘Login’ to the list returned by allClassNames, ‘LOGIN’ to the list returned by allTableNames, and methods called classModelForLogin that creates and returns the appropriate class model describing Login, and tableForLOGIN that returns an initialized DatabaseTable.

Inside of the classModelForLogin, you create a GlorpClassModel, give it a name, and then you describe attributes by calling methods like

newAttributeNamed: aSymbol collection: collectionClass of: aClass

for single values and toOne relationships or

newAttributeNamed: aSymbol collection: collectionClass of: aClass, Field, Relationship

for toMany relationships. Notice that this adds the typing information for attributes that is typically missing in Smalltalk.
The other model represents the underlying database and includes objects like DatabaseTable, DatabaseColumn, ForeignKeyConstraint, DatabaseSequence, DatabaseIndex and such. Stuff all relational databases have. You describe your database procedurally, same as you did with the class model, only you implement tableForLOGIN and say things like:
system tableNamed: aString
addColumnNamed: aString type: aDatabaseType
addForeignKeyConstraint: (ForeignKeyConstraint sourceField: srcField targetField: toField)

You have to be totally explicit here, if you need a foreign key field, you have to define it. If you need a link table, you need to define it. You have to specify the field for the primary key – which means the object has to have a field for the primary key. Glorp doesn’t figure any of that stuff out for you.

Finally, you specify the mapping from the class model to the table model. You do this by creating Descriptors (one for each class) and Mappings. Mappings can represent either data fields, or relationships. There are several kinds. For instance, a many to many mapping will involve specifying the primary keys of each table, the link table, and the mappings from primary keys to link table fields. Most of this can be derived from the class model’s relationship and foreign key constraints. But you still have to specify it.

At this point, you are probably thinking – I have 100 classes in my model That’s 300 methods I have to write! And it is all mostly boilerplate! I couldn’t agree more. All you really need is an appropriate metamodel.

In a previous life, I wrote WebObjects applications. WebObjects has an ORM called the Enterprise Objects Framework or EOF. It was light years ahead of its time. Now, its about average as Apple has neglected WebObjects terribly and nobody I know uses it anymore. EOF came with a great program called EOModeler and stored the model in text files in PList format. I have code that reads and writes PLists in every language I know – they are insanely useful.

The EOModel was the first file format that described all of the meta information required by a typical ORM library. So I did the easy thing – leveraged EOModel files to build DescriptorSystems. (I was not alone in realizing this). I subclassed DescriptorSystem and created EODescriptorSystem which reads an EOModel file and builds a DescriptorSystem from it. (I also wrote my own EOModeler application in Java Swing, both as an experiment, and out of frustration because Apple was neglecting WebObjects to an extent that the tools were falling apart. It works well enough but the experience of writing it put me off Java for good).

This experience teaches me that any reasonably expressive meta model can be leveraged to build a descriptor system.  In my current porting project, the original developers ended up creating their own ad hoc meta model with explicit modeling of attributes, relationships, and types.  I leveraged this to automatically generate a descriptor system from the classes themselves (all domain classes have a common root), and was able to infer link tables, foreign keys, and so forth from the meta model.  So from the model I get the schema and the descriptor system.
Of course, translating to relational format required a few subtle changes to the object model, so I also added code to construct the table model directly from PostgreSQL and, by comparing it with the schema generated from the classes, can figure out alter scripts to automate schema migration.  This is working great and building out new domain objects has become really easy.  I would say with Seaside and this infrastructure, I have surpassed Rails for ease of development.

But I think I can do better and have begun to investigate Magritte. Stay tuned.