I’ve been slow in posting this because I’ve been trying to make a deadline and there hasn’t been time to do anything else. But at the moment, the data migration scripts are running and I’m just babysitting a pair of computers watching for problems, tweaking the code, and restarting the migration.
And it is taking forever. The old OmniBase db is about 880M, a fair amount of it is garbage. A hot backup script written by OmniBase’s author takes over 2 hours to run. It doesn’t reinflate the objects either. This is just not scaling. At any given time, I figure this application hosts perhaps 20 users tops. On those days, it crawls.
The users have been wanting extracts for generating annual reports for the last 4 months. I have written numerous scripts to try to generate this for them. They always fail. Often, after a few hundred reads, OmniBase will get itself into a state with some hanging locks and will need to be killed. A script that tries to touch every object in the database takes well over 30 hours to run. Why? I’m not sure. Also, users are asking for reports that involve complex traversals that take too long. Not enough entry points into the data. There has to be a better way.
Enter PostgreSQL. The best and most free database of the free solutions. Postgres rocks. It just works. The tools are nice. The data type selection is extensive. It is easy to extend with additional scripting languages. The most magical (to me) utility is pg_dump. This little gem can back up an entire fully populated postgres database with equivalent data, hot while the app is running, in about 10 seconds. The resulting file is around 140M. I’m sold.
Except that, because of the ‘magic’ nature of OODBs, this application is written (mostly) as if all objects are just in memory – so converting to a SQL oriented format just wasn’t going to be feasible.
So I adopted GLORP and set out to emulate the existing database api. This has been successsful. Wildly so.
Glorp uses a meta model – actually two of them – one for the databse, one for the objects, plus a mapping model in between. The whole bundle is contained in a class called DescriptorSystem. DescriptorSystem is abstract and uses the Template Method Pattern to help you build out the models. You subclass DescriptorSystem and then override methods like allClassNames, allTableNames and so forth. The initialize method will call these and then start iterating over them calling methods with names derived from items in the list. So if you have a class Login mapped to a Table LOGIN, you need to add ‘Login’ to the list returned by allClassNames, ‘LOGIN’ to the list returned by allTableNames, and methods called classModelForLogin that creates and returns the appropriate class model describing Login, and tableForLOGIN that returns an initialized DatabaseTable.
Inside of the classModelForLogin, you create a GlorpClassModel, give it a name, and then you describe attributes by calling methods like
newAttributeNamed: aSymbol collection: collectionClass of: aClass
for single values and toOne relationships or
newAttributeNamed: aSymbol collection: collectionClass of: aClass, Field, Relationship
for toMany relationships. Notice that this adds the typing information for attributes that is typically missing in Smalltalk.
The other model represents the underlying database and includes objects like DatabaseTable, DatabaseColumn, ForeignKeyConstraint, DatabaseSequence, DatabaseIndex and such. Stuff all relational databases have. You describe your database procedurally, same as you did with the class model, only you implement tableForLOGIN and say things like:
system tableNamed: aString
addColumnNamed: aString type: aDatabaseType
addForeignKeyConstraint: (ForeignKeyConstraint sourceField: srcField targetField: toField)
You have to be totally explicit here, if you need a foreign key field, you have to define it. If you need a link table, you need to define it. You have to specify the field for the primary key – which means the object has to have a field for the primary key. Glorp doesn’t figure any of that stuff out for you.
Finally, you specify the mapping from the class model to the table model. You do this by creating Descriptors (one for each class) and Mappings. Mappings can represent either data fields, or relationships. There are several kinds. For instance, a many to many mapping will involve specifying the primary keys of each table, the link table, and the mappings from primary keys to link table fields. Most of this can be derived from the class model’s relationship and foreign key constraints. But you still have to specify it.
At this point, you are probably thinking – I have 100 classes in my model That’s 300 methods I have to write! And it is all mostly boilerplate! I couldn’t agree more. All you really need is an appropriate metamodel.
In a previous life, I wrote WebObjects applications. WebObjects has an ORM called the Enterprise Objects Framework or EOF. It was light years ahead of its time. Now, its about average as Apple has neglected WebObjects terribly and nobody I know uses it anymore. EOF came with a great program called EOModeler and stored the model in text files in PList format. I have code that reads and writes PLists in every language I know – they are insanely useful.
The EOModel was the first file format that described all of the meta information required by a typical ORM library. So I did the easy thing – leveraged EOModel files to build DescriptorSystems. (I was not alone in realizing this). I subclassed DescriptorSystem and created EODescriptorSystem which reads an EOModel file and builds a DescriptorSystem from it. (I also wrote my own EOModeler application in Java Swing, both as an experiment, and out of frustration because Apple was neglecting WebObjects to an extent that the tools were falling apart. It works well enough but the experience of writing it put me off Java for good).
This experience teaches me that any reasonably expressive meta model can be leveraged to build a descriptor system.Â In my current porting project, the original developers ended up creating their own ad hoc meta model with explicit modeling of attributes, relationships, and types.Â I leveraged this to automatically generate a descriptor system from the classes themselves (all domain classes have a common root), and was able to infer link tables, foreign keys, and so forth from the meta model.Â So from the model I get the schema and the descriptor system.
Of course, translating to relational format required a few subtle changes to the object model, so I also added code to construct the table model directly from PostgreSQL and, by comparing it with the schema generated from the classes, can figure out alter scripts to automate schema migration.Â This is working great and building out new domain objects has become really easy.Â I would say with Seaside and this infrastructure, I have surpassed Rails for ease of development.
But I think I can do better and have begun to investigate Magritte. Stay tuned.