I hate being sick.
Down with the mother of all colds
February 9th, 2007Seaside update complete!
February 6th, 2007That was easier than I thought – thanks to a nearly complete port done by my predecessor. However, I have finally thrown in the towel and upgraded the entire application to a 3.8 Squeak image. 3.8 brings a lot of baggage I don’t need – like character encodings and true type font rendering.
However, enough tools such as squeakmap and glorp have become dependent on the newer apis that integrating new code was getting to be too hard. So I started with a clean 3.8 image and started loading Monticello packages and eventually ended up with a completely new build. Then I upgraded the Seaside version.
With a new build and upgraded infrastructure, I then successfully loaded Magritte into the image. So now I have been working through the tutorials.
More on that later.
Database Transplant Successful
January 31st, 2007OmniBase is dead.
I killed it.
The replacement with GLORP has been successful. Rate of error discovery has stabilized and, after about 3 weeks of production experience and many small fixes, is now below OmniBase’s on its best day.
A few things I’ve learned along the way:
1) Put limits on all search queries. All of them. Too many times in the last couple weeks the system became unresponsive for a couple minutes as it fetched every blessed row in the database. If you’re getting that much stuff, you’re not gonna find it. I’ve limited all searches to 20 rows.
2) Use plain old objects wherever possible. OODBs have a history of being rotten at schema migration. So all objects used dictionaries for ivar storage. This makes it easy to add or remove ivars without getting messed up in the OODB as the object’s ‘format’ is always the same. However, all that hashing and fetching, and searching, isn’t free. I didn’t realize how expensive it was until I ripped it out and replaced them with vanilla ivars. Zooooom!
Because the mapping is from the objects to the RDBMS through GLORP on an attribute level rather than relying on blitting a whole object as a chunk, the format of the objects is decoupled from the database representation. Changes in the object’s binary format has no bearing on the database representation.
3) When generating accessors – add lazy initialization where nil is not an acceptable value.
4) Keep it meta. The meta model is the most important thing we have. With a good meta model all else can be rapidly replaced. Consequently, I think the next step in evolution – conversion of the meta model to use Magritte and then replacement of the UI to use Magritte as well, will radically improve agility.
Meanwhile, to prepare, I’m updating the version of Seaside that the application is built on. This is turning out to be harder than I thought as Seaside has moved on quite a bit and many classes in my old version have been simply replaced in the newer one.
The really great thing about using meta facilities like glorp and magritte, is that the application continues to shrink until it is mostly just the meta model.
Less code, less bugs.
Tell Congress: Support the Genetic Information Nondiscrimination Act (GINA)
January 19th, 2007You can be completely healthy and still denied health or life insurance based on your DNA! As scientists identify genetic causes of diseases, insurance companies are using that information to deny coverage to Americans. Grandpa had Alzheimer’s? You might carry the gene – and be turned down for coverage! GINA has hit the House. Show your support!
I have just been turned down for coverage at Regence Blue Shield because of genetic testing that showed I have hereditary hemochromatosis. The diagnosis was early, no damage has occurred, maintenance regime of monthly phlebotomies will keep my iron levels normal, and they still won’t write me a policy.
Thanks for writing to your US Representative and asking him to support GINA.
read more | digg story
Records and Objects
January 2nd, 2007I’ve been slow in posting this because I’ve been trying to make a deadline and there hasn’t been time to do anything else. But at the moment, the data migration scripts are running and I’m just babysitting a pair of computers watching for problems, tweaking the code, and restarting the migration.
And it is taking forever. The old OmniBase db is about 880M, a fair amount of it is garbage. A hot backup script written by OmniBase’s author takes over 2 hours to run. It doesn’t reinflate the objects either. This is just not scaling. At any given time, I figure this application hosts perhaps 20 users tops. On those days, it crawls.
The users have been wanting extracts for generating annual reports for the last 4 months. I have written numerous scripts to try to generate this for them. They always fail. Often, after a few hundred reads, OmniBase will get itself into a state with some hanging locks and will need to be killed. A script that tries to touch every object in the database takes well over 30 hours to run. Why? I’m not sure. Also, users are asking for reports that involve complex traversals that take too long. Not enough entry points into the data. There has to be a better way.
Enter PostgreSQL. The best and most free database of the free solutions. Postgres rocks. It just works. The tools are nice. The data type selection is extensive. It is easy to extend with additional scripting languages. The most magical (to me) utility is pg_dump. This little gem can back up an entire fully populated postgres database with equivalent data, hot while the app is running, in about 10 seconds. The resulting file is around 140M. I’m sold.
Except that, because of the ‘magic’ nature of OODBs, this application is written (mostly) as if all objects are just in memory – so converting to a SQL oriented format just wasn’t going to be feasible.
So I adopted GLORP and set out to emulate the existing database api. This has been successsful. Wildly so.
Glorp uses a meta model – actually two of them – one for the databse, one for the objects, plus a mapping model in between. The whole bundle is contained in a class called DescriptorSystem. DescriptorSystem is abstract and uses the Template Method Pattern to help you build out the models. You subclass DescriptorSystem and then override methods like allClassNames, allTableNames and so forth. The initialize method will call these and then start iterating over them calling methods with names derived from items in the list. So if you have a class Login mapped to a Table LOGIN, you need to add ‘Login’ to the list returned by allClassNames, ‘LOGIN’ to the list returned by allTableNames, and methods called classModelForLogin that creates and returns the appropriate class model describing Login, and tableForLOGIN that returns an initialized DatabaseTable.
Inside of the classModelForLogin, you create a GlorpClassModel, give it a name, and then you describe attributes by calling methods like
newAttributeNamed: aSymbol collection: collectionClass of: aClass
for single values and toOne relationships or
newAttributeNamed: aSymbol collection: collectionClass of: aClass, Field, Relationship
for toMany relationships. Notice that this adds the typing information for attributes that is typically missing in Smalltalk.
The other model represents the underlying database and includes objects like DatabaseTable, DatabaseColumn, ForeignKeyConstraint, DatabaseSequence, DatabaseIndex and such. Stuff all relational databases have. You describe your database procedurally, same as you did with the class model, only you implement tableForLOGIN and say things like:
system tableNamed: aString
addColumnNamed: aString type: aDatabaseType
addForeignKeyConstraint: (ForeignKeyConstraint sourceField: srcField targetField: toField)
You have to be totally explicit here, if you need a foreign key field, you have to define it. If you need a link table, you need to define it. You have to specify the field for the primary key – which means the object has to have a field for the primary key. Glorp doesn’t figure any of that stuff out for you.
Finally, you specify the mapping from the class model to the table model. You do this by creating Descriptors (one for each class) and Mappings. Mappings can represent either data fields, or relationships. There are several kinds. For instance, a many to many mapping will involve specifying the primary keys of each table, the link table, and the mappings from primary keys to link table fields. Most of this can be derived from the class model’s relationship and foreign key constraints. But you still have to specify it.
At this point, you are probably thinking – I have 100 classes in my model That’s 300 methods I have to write! And it is all mostly boilerplate! I couldn’t agree more. All you really need is an appropriate metamodel.
In a previous life, I wrote WebObjects applications. WebObjects has an ORM called the Enterprise Objects Framework or EOF. It was light years ahead of its time. Now, its about average as Apple has neglected WebObjects terribly and nobody I know uses it anymore. EOF came with a great program called EOModeler and stored the model in text files in PList format. I have code that reads and writes PLists in every language I know – they are insanely useful.
The EOModel was the first file format that described all of the meta information required by a typical ORM library. So I did the easy thing – leveraged EOModel files to build DescriptorSystems. (I was not alone in realizing this). I subclassed DescriptorSystem and created EODescriptorSystem which reads an EOModel file and builds a DescriptorSystem from it. (I also wrote my own EOModeler application in Java Swing, both as an experiment, and out of frustration because Apple was neglecting WebObjects to an extent that the tools were falling apart. It works well enough but the experience of writing it put me off Java for good).
This experience teaches me that any reasonably expressive meta model can be leveraged to build a descriptor system. In my current porting project, the original developers ended up creating their own ad hoc meta model with explicit modeling of attributes, relationships, and types. I leveraged this to automatically generate a descriptor system from the classes themselves (all domain classes have a common root), and was able to infer link tables, foreign keys, and so forth from the meta model. So from the model I get the schema and the descriptor system.
Of course, translating to relational format required a few subtle changes to the object model, so I also added code to construct the table model directly from PostgreSQL and, by comparing it with the schema generated from the classes, can figure out alter scripts to automate schema migration. This is working great and building out new domain objects has become really easy. I would say with Seaside and this infrastructure, I have surpassed Rails for ease of development.
But I think I can do better and have begun to investigate Magritte. Stay tuned.
Object Oriented Databases (OODBMS)
December 20th, 2006I used to be something of an expert on Object Oriented Database Systems, how to use them, etc. The way you get to be an expert at a technology is to simply get ahead of the adoption curve and spend a bunch of time figuring out how to make the technology work before it becomes common knowledge. At this point, you can charge premium rates as you have scarce knowledge in your posession.
I got to this position by working for some adventurous folks that were willing to take a chance on the hype. The first one I came across was ObjectStore with C++. The project didn’t launch, but we built lots of prototypes and I got a good indoctrination into the ups and downs of OODBMS lore. Later on I was exposed to some others, Versant, Poet, and some lesser known ones. Like relational databases, once you get the hang of one, the rest are pretty similar. The key concepts are:
1) OODBS allow you to create one or more named “roots”. A root is basically a variable – you ask for the object at root “foo” and get it back. Some only give you one root. If you only get one root, then almost always you just stick a hash/map/dictionary at the root and pretend you have several anyhow. The root is your entry point to the data.
2) All object manipulations/accesses must be done within a transaction context. So you end up digging through your app looking for sensible transaction boundaries. For a web app, you typically begin a transaction at the beginning of a request and commit it just before sending the response. You want to keep transactions short so as not to have other users waiting on locks.
3) Objects become part of the database via “reachability”. The OODBMS will “trace” your object graph starting at the root upon commit, calculate changes to the graph, and then write the changes to the database. Any new objects reachable from the root object automatically becomes part of the database. While this might sound expensive, it generally is quite cheap.
So you generally open a transaction, lookup an object from a root, navigate to the object of interest, make changes, and then commit the transaction. Many also let you hang onto an object reference across transactions. The object reference can only be accessed within a transaction – trying to read data from it outside of a transaction will fail with an exception. This makes redrawing user interfaces problematic.
OODBs come from the CAD world where you have a network of a zillion objects, all slightly different, where mapping them to a regular container like a db table would be really expensive. They’re really good at this object persistence game.
OODBs are seductive. They are easy to get started with. For one thing, you don’t have to do a data model, just your object model. Your code is your model. You make objects, stick them in containers, and forget about them. Sounds great, right?
But as anyone who has lived with an OODBMS for any period of time knows, Object databases are great, until they’re not, and then they truly suck. Here’s why:
1) Concurrency is very poor. As I mentioned, OODBs come from the CAD world and work well for storing complex cad models. But CAD models are seldom updated concurrently by large numbers of people. As you modify objects within a transaction, the OODB has to obtain locks on your modified objects to guarantee consistency. Unfortunately, none of them (that I know of) implement object level locking. Most implement locking at the memory page level. Spurious lock conflicts where two unrelated objects share a memory page can be common. Resolving these conflicts can be expensive. Because, all work must happen withing a transaction, transactions tend to be on the long side.
2) Constant re-fetching of data every transaction makes keeping user interface elements up to date very expensive. There is no user level in-memory caching without writing user level code to create transient copies.
3) Schema migration is hard, if not impossible. Your object defines your format. Adding a field to a class makes your in-memory model inconsistent with the slabs of bits you wrote out before you added the field. There are ways around this. The usual one is to have one ivar that is a dictionary. Otherwise, there are usually some very user un-friendly scripts that have to be run. In many cases, the database must be taken offline to do this. So much for your three nines availability.
4) Death by a trillion bug fixes. I can’t speak for all, but ObjectStore would require the database be taken offline and an update script be run for every upgrade. For a site that is supposed to be up all the time, this isn’t acceptable. So upgrades were deferred. When we did this, we found that
5) OODBMS providers have limited resources and will only support versions up to one year old. If you get too far out of date and your db goes down, you are flat out of luck. The support people won’t help you. Only a really large organization could afford to keep up with all the little point fix releases ObjectStore made in a year – we couldn’t afford the man power or the down time.
6) Bugs are forever. If you put a bug into your program that damages the object model, it becomes enshrined in the database. Subsequent read code that finds the malformed chunk of the object model will usually fail. Subtle corruptions build up over time making a full database walk harder and harder to complete over time. Conventional databases can avoid this by implementing appropriate constraints.
7) No security. Any screwball developer can destroy your reference data (usually stored in ordered collections off of the root). A conventional relational database can safeguard important data with roles, permissions, and constraints.
Garbage Collection is not universally available. Orphaned junk is common. Some OODBs provide GC utilities, however they can fail if there is corrupt data (see items 6 and 7).
9) No ad hoc query capability. You have to write a new program to view any data at all. You need to write programs to update reference data. You need a program to do anything at all with your data. No fixing problems with a quick line of SQL. Searching for unanticipated patterns is difficult.
I’ve been bitten by all of these issues at one time or another and have recently inherited an application written using a Smalltalk OODB called OmniBase. Debugging this application is extremely painful because launching a debugger results in the transaction being terminated and all object references becoming invalid. Thus, the data that might provide a clue as to the source of the error is gone. Additionally, while the author claims to provide support, he simply collects fees and then tells you that your application doesn’t run in his environment, blames you for writing rotten code, and declines future contact.
So this dog has to go.
Fortunately, you can get most of the benefits of an OODB without the drawbacks by using an Object Relational Mapping framework. I’ve selected GLORP, an open source mapping framework that is improving all the time, and found that I can implement support the part of OmniBase’s API with very little change to the user interface, which is written in Seaside under Squeak.
Next time, I’ll talk a little bit about how this works.
Well, that was exciting
December 20th, 2006In case you haven’t been watching, the Pacific NW got hammered last week with a big storm. Trees were knocked down, taking power and cable lines with them and blocking roads. We lost power for a day, internet for a week, and the main entrance to our neighborhood was blocked with fallen power cables and trees hanging dangerously over the roadway from the wreckage of the power wires.
To make things extra fun, a pump failed and my lot began to flood. I rented a portble to get the water level low enough so I could work on them and installed a new one on Monday. Good thing, as it is raining hard again today.
Today, the internet came back on and they cleared the road. For me, things are back to normal, but a week without internet was kind of a drag.
Education vs Training
November 29th, 2006Scoble points to a post by Steve Sloan who has been teaching a podcasting/new media class at SJSU. It seems the administration would prefer to teach tools rather than theory.
I taught CS at University of Colorado, Denver for several years. Just one course per semester. We had many discussions about just this problem. UCD is a satellite campus. Most courses are taught at night. Students often have full time jobs. They want better jobs. They go to school specifically to get better jobs and they don’t have a lot spare resources to spend on school. They just want to learn the latest hot technologies like Java and .NET and be marketable. In other words, they want job training.
The problem with job training is that the information is perishable. In the mid-90s all the rage was C++. Many universities, ours included, altered their curriculum to use C++ as the teaching language in order to be more appealing to students. After all, you can teach theory using pretty much any general purpose programming language.
Except that C++ is a terrible teaching language. (Actually, its just a terrible langage). It is too complicated and I wasted many classroom hours helping students cope with the quirks in the language instead of focusing on the content. And now, the C++ knowledge is mostly useless. Nearly all C++ work has been supplanted by Java work. So the students need to retrain.
A better idea is to use languages that can most clearly illustrate the concepts with minimum extraneous complexity. More languages means more viewpoints, and tends to make students understand that the language or tool isn’t that important. It is the underlying concepts.
The University is between a rock and a hard place. With education funding cuts, they need to attract students to survive. To attract students, they need to offer classes the students want to take. Students find training classes most attractive as they offer instant gratification. But training classes are like candy. They’re not good for you in the long run and the fix is short lived. Education is more like vegetables. It is good for you, but maybe not so pleasant to digest. The University would like to stick to vegetables, but if noone orders them, they have to sell candy too.
Which is unfortunate. The state of the software industry is deplorable. I think 90% of the people programming out there ought to be doing something else. They suck at their job and aren’t even educated enough to understand that they suck much less how they suck. One trick ponies, they flounder if given a problem that isn’t pre-solved in their platform. FACT: the best indicator that a candidate is likely to fail the interview process at big river books is if they characterize themselves as a “Java Architect” or “Java Developer”. It usually means they don’t know anything else. They think Java (or .NET) is the pinnacle of software achievement. Without a proper education, they can’t conceive of anything else.
The real solution is to properly fund universities as institutes of higher learning and stick to education. If the universites want to also offer vocational training, that’s fine. Just don’t cheapen the academic programs by offering “degrees”. Certificates of completion should be adequate.
Downriver
November 27th, 2006For the past 2 and a half years I worked at Amazon.com. It was fun for the first year – so many old assumptions and prejudices shattered. But Amazon is a special case. For most normal sized systems, my old design sense was pretty solid.
Still, it was a horizon broadening experience and I enjoyed that. I managed teams of people and we built software and I liked that as a change from the endless parade of crummy short term java contracts I was getting.
But I left last month. I joined as something of a new manager. My pay grade was commensurate with my lack of experience in that area. But eventually I grew weary of it and was itching to get back to doing nifty code if I could find a way to do it on my terms. Which means dynamic expressive languages and I get creative control of the technology. No “You’re the architect – so you’ll use this language and that vendor’s solution”. Huh, I thought I was the architect.
The other main driver to leave is no work/life balance. This isn’t Amazon specific. This is US company specfific. In the US, if you work for an established company, this is just how it is. You get 2, maybe 3 weeks of vacation and a few holidays here and there. You are expected to put in 50 hours a week. With ever rising property values and congested highways, you have to live about an hour away from work, meaning you lose 2 unbillable hours a day just travelling to work. You’re working your butt off, but you can’t enjoy the fruits of your labor.
I lived in France for about half a year. I’ve seen how Europeans live. They take 5-6 weeks of paid vacation. They can take long leaves of absence. They are able to travel the world. In the US, you can’t get enough days off to drive across the country, much less travel abroad. No wonder we are such an ignorant xenophobic lot.
I have a boat. I’d like to take the boat in the summer and explore Puget Sound, where I live. I’ll need about 4 contiguous weeks to do it. I couldn’t get the time off. Why have a boat if I can’t take the time to enjoy it?
I have friends abroad. I can never get the time to go see them. I have the money. Just not the time. Again, this is lame. So I walked. I give up on work camp America. US companies say they can’t find qualified workers. We’re around. But your terms stink. Improve them or go pound sand.
I left the big company to work for myself. I build software using tools I like. Unconventional, but productive and low-cost tools like Squeak and Seaside. I use other things too, depending on requirements. I work when I want to, from anywhere I like.
I think this is the future as more and more of my colleagues are opting for this kind of situation. The big company life holds no attraction for the seasoned employee.
Happy Thanksgiving!
November 23rd, 2006I love to cook, so this is my “recreation” day.
Turkey, pear stuffing, apricot cranberry sauce, yams, fruit salad, baked pears with hazelnut chocolate sauce for dessert.
What’s in your oven?