Archive for the ‘squeak’ Category

Summer of Rails

Sunday, September 21st, 2008

I’ve now worked on three different Rails applications. One of them was from scratch, the other two I took over from someone else. The thing I like most about Rails is Active Record – it just works and it is easy to use – even for existing databases (although it takes a bit more work to specify the mappings).

I have a project coming up that would probably be a great Seaside candidate. The database has to be postgresql (according to the client). There is a native cocoa component – I’ll probably give BaseTen a try. For the web component, the obvious candidates are Rails (although I don’t know the state of Rails with PG – only mysql), and Seaside/Glorp – but I need to use Glorp to work like Active Record since the DB will be the master source of record for the schema.

Sadly, it doesn’t look like Glorp’s ActiveRecord on Squeak is ready for prime time, I might have to kind of finish that implementation.

Looking for work again

Monday, March 3rd, 2008

I’ve been doing a lot of PHP lately – shopping cart integration, web service clients, lots of ecommerce stuff. But I’m coming to the end of these projects and am looking for new challenges. Got a project or position that I can do remotely? Drop me a line.

I’m a Google Summer of Code Mentor!

Wednesday, March 21st, 2007

Are you a student? Want to make some cash? Want to do it with Smalltalk? Sign up for one of my Google Summer of Code projects.

Seaside update complete!

Tuesday, February 6th, 2007

That was easier than I thought – thanks to a nearly complete port done by my predecessor. However, I have finally thrown in the towel and upgraded the entire application to a 3.8 Squeak image. 3.8 brings a lot of baggage I don’t need – like character encodings and true type font rendering.

However, enough tools such as squeakmap and glorp have become dependent on the newer apis that integrating new code was getting to be too hard. So I started with a clean 3.8 image and started loading Monticello packages and eventually ended up with a completely new build. Then I upgraded the Seaside version.

With a new build and upgraded infrastructure, I then successfully loaded Magritte into the image. So now I have been working through the tutorials.

More on that later.

Database Transplant Successful

Wednesday, January 31st, 2007

OmniBase is dead.

I killed it.

The replacement with GLORP has been successful. Rate of error discovery has stabilized and, after about 3 weeks of production experience and many small fixes, is now below OmniBase’s on its best day.

A few things I’ve learned along the way:

1) Put limits on all search queries. All of them. Too many times in the last couple weeks the system became unresponsive for a couple minutes as it fetched every blessed row in the database. If you’re getting that much stuff, you’re not gonna find it. I’ve limited all searches to 20 rows.

2) Use plain old objects wherever possible. OODBs have a history of being rotten at schema migration. So all objects used dictionaries for ivar storage. This makes it easy to add or remove ivars without getting messed up in the OODB as the object’s ‘format’ is always the same. However, all that hashing and fetching, and searching, isn’t free. I didn’t realize how expensive it was until I ripped it out and replaced them with vanilla ivars. Zooooom!

Because the mapping is from the objects to the RDBMS through GLORP on an attribute level rather than relying on blitting a whole object as a chunk, the format of the objects is decoupled from the database representation. Changes in the object’s binary format has no bearing on the database representation.

3) When generating accessors – add lazy initialization where nil is not an acceptable value.

4) Keep it meta. The meta model is the most important thing we have. With a good meta model all else can be rapidly replaced. Consequently, I think the next step in evolution – conversion of the meta model to use Magritte and then replacement of the UI to use Magritte as well, will radically improve agility.

Meanwhile, to prepare, I’m updating the version of Seaside that the application is built on. This is turning out to be harder than I thought as Seaside has moved on quite a bit and many classes in my old version have been simply replaced in the newer one.

The really great thing about using meta facilities like glorp and magritte, is that the application continues to shrink until it is mostly just the meta model.

Less code, less bugs.

Records and Objects

Tuesday, January 2nd, 2007

I’ve been slow in posting this because I’ve been trying to make a deadline and there hasn’t been time to do anything else. But at the moment, the data migration scripts are running and I’m just babysitting a pair of computers watching for problems, tweaking the code, and restarting the migration.

And it is taking forever. The old OmniBase db is about 880M, a fair amount of it is garbage. A hot backup script written by OmniBase’s author takes over 2 hours to run. It doesn’t reinflate the objects either. This is just not scaling. At any given time, I figure this application hosts perhaps 20 users tops. On those days, it crawls.

The users have been wanting extracts for generating annual reports for the last 4 months. I have written numerous scripts to try to generate this for them. They always fail. Often, after a few hundred reads, OmniBase will get itself into a state with some hanging locks and will need to be killed. A script that tries to touch every object in the database takes well over 30 hours to run. Why? I’m not sure. Also, users are asking for reports that involve complex traversals that take too long. Not enough entry points into the data. There has to be a better way.

Enter PostgreSQL. The best and most free database of the free solutions. Postgres rocks. It just works. The tools are nice. The data type selection is extensive. It is easy to extend with additional scripting languages. The most magical (to me) utility is pg_dump. This little gem can back up an entire fully populated postgres database with equivalent data, hot while the app is running, in about 10 seconds. The resulting file is around 140M. I’m sold.

Except that, because of the ‘magic’ nature of OODBs, this application is written (mostly) as if all objects are just in memory – so converting to a SQL oriented format just wasn’t going to be feasible.

So I adopted GLORP and set out to emulate the existing database api. This has been successsful. Wildly so.
Glorp uses a meta model – actually two of them – one for the databse, one for the objects, plus a mapping model in between. The whole bundle is contained in a class called DescriptorSystem. DescriptorSystem is abstract and uses the Template Method Pattern to help you build out the models. You subclass DescriptorSystem and then override methods like allClassNames, allTableNames and so forth. The initialize method will call these and then start iterating over them calling methods with names derived from items in the list. So if you have a class Login mapped to a Table LOGIN, you need to add ‘Login’ to the list returned by allClassNames, ‘LOGIN’ to the list returned by allTableNames, and methods called classModelForLogin that creates and returns the appropriate class model describing Login, and tableForLOGIN that returns an initialized DatabaseTable.

Inside of the classModelForLogin, you create a GlorpClassModel, give it a name, and then you describe attributes by calling methods like

newAttributeNamed: aSymbol collection: collectionClass of: aClass

for single values and toOne relationships or

newAttributeNamed: aSymbol collection: collectionClass of: aClass, Field, Relationship

for toMany relationships. Notice that this adds the typing information for attributes that is typically missing in Smalltalk.
The other model represents the underlying database and includes objects like DatabaseTable, DatabaseColumn, ForeignKeyConstraint, DatabaseSequence, DatabaseIndex and such. Stuff all relational databases have. You describe your database procedurally, same as you did with the class model, only you implement tableForLOGIN and say things like:
system tableNamed: aString
addColumnNamed: aString type: aDatabaseType
addForeignKeyConstraint: (ForeignKeyConstraint sourceField: srcField targetField: toField)

You have to be totally explicit here, if you need a foreign key field, you have to define it. If you need a link table, you need to define it. You have to specify the field for the primary key – which means the object has to have a field for the primary key. Glorp doesn’t figure any of that stuff out for you.

Finally, you specify the mapping from the class model to the table model. You do this by creating Descriptors (one for each class) and Mappings. Mappings can represent either data fields, or relationships. There are several kinds. For instance, a many to many mapping will involve specifying the primary keys of each table, the link table, and the mappings from primary keys to link table fields. Most of this can be derived from the class model’s relationship and foreign key constraints. But you still have to specify it.

At this point, you are probably thinking – I have 100 classes in my model That’s 300 methods I have to write! And it is all mostly boilerplate! I couldn’t agree more. All you really need is an appropriate metamodel.

In a previous life, I wrote WebObjects applications. WebObjects has an ORM called the Enterprise Objects Framework or EOF. It was light years ahead of its time. Now, its about average as Apple has neglected WebObjects terribly and nobody I know uses it anymore. EOF came with a great program called EOModeler and stored the model in text files in PList format. I have code that reads and writes PLists in every language I know – they are insanely useful.

The EOModel was the first file format that described all of the meta information required by a typical ORM library. So I did the easy thing – leveraged EOModel files to build DescriptorSystems. (I was not alone in realizing this). I subclassed DescriptorSystem and created EODescriptorSystem which reads an EOModel file and builds a DescriptorSystem from it. (I also wrote my own EOModeler application in Java Swing, both as an experiment, and out of frustration because Apple was neglecting WebObjects to an extent that the tools were falling apart. It works well enough but the experience of writing it put me off Java for good).

This experience teaches me that any reasonably expressive meta model can be leveraged to build a descriptor system.  In my current porting project, the original developers ended up creating their own ad hoc meta model with explicit modeling of attributes, relationships, and types.  I leveraged this to automatically generate a descriptor system from the classes themselves (all domain classes have a common root), and was able to infer link tables, foreign keys, and so forth from the meta model.  So from the model I get the schema and the descriptor system.
Of course, translating to relational format required a few subtle changes to the object model, so I also added code to construct the table model directly from PostgreSQL and, by comparing it with the schema generated from the classes, can figure out alter scripts to automate schema migration.  This is working great and building out new domain objects has become really easy.  I would say with Seaside and this infrastructure, I have surpassed Rails for ease of development.

But I think I can do better and have begun to investigate Magritte. Stay tuned.

Object Oriented Databases (OODBMS)

Wednesday, December 20th, 2006

I used to be something of an expert on Object Oriented Database Systems, how to use them, etc. The way you get to be an expert at a technology is to simply get ahead of the adoption curve and spend a bunch of time figuring out how to make the technology work before it becomes common knowledge. At this point, you can charge premium rates as you have scarce knowledge in your posession.

I got to this position by working for some adventurous folks that were willing to take a chance on the hype. The first one I came across was ObjectStore with C++. The project didn’t launch, but we built lots of prototypes and I got a good indoctrination into the ups and downs of OODBMS lore. Later on I was exposed to some others, Versant, Poet, and some lesser known ones. Like relational databases, once you get the hang of one, the rest are pretty similar. The key concepts are:

1) OODBS allow you to create one or more named “roots”. A root is basically a variable – you ask for the object at root “foo” and get it back. Some only give you one root. If you only get one root, then almost always you just stick a hash/map/dictionary at the root and pretend you have several anyhow. The root is your entry point to the data.

2) All object manipulations/accesses must be done within a transaction context. So you end up digging through your app looking for sensible transaction boundaries. For a web app, you typically begin a transaction at the beginning of a request and commit it just before sending the response. You want to keep transactions short so as not to have other users waiting on locks.

3) Objects become part of the database via “reachability”. The OODBMS will “trace” your object graph starting at the root upon commit, calculate changes to the graph, and then write the changes to the database. Any new objects reachable from the root object automatically becomes part of the database. While this might sound expensive, it generally is quite cheap.

So you generally open a transaction, lookup an object from a root, navigate to the object of interest, make changes, and then commit the transaction. Many also let you hang onto an object reference across transactions. The object reference can only be accessed within a transaction – trying to read data from it outside of a transaction will fail with an exception. This makes redrawing user interfaces problematic.

OODBs come from the CAD world where you have a network of a zillion objects, all slightly different, where mapping them to a regular container like a db table would be really expensive. They’re really good at this object persistence game.

OODBs are seductive. They are easy to get started with. For one thing, you don’t have to do a data model, just your object model. Your code is your model. You make objects, stick them in containers, and forget about them. Sounds great, right?

But as anyone who has lived with an OODBMS for any period of time knows, Object databases are great, until they’re not, and then they truly suck. Here’s why:

1) Concurrency is very poor. As I mentioned, OODBs come from the CAD world and work well for storing complex cad models. But CAD models are seldom updated concurrently by large numbers of people. As you modify objects within a transaction, the OODB has to obtain locks on your modified objects to guarantee consistency. Unfortunately, none of them (that I know of) implement object level locking. Most implement locking at the memory page level. Spurious lock conflicts where two unrelated objects share a memory page can be common. Resolving these conflicts can be expensive. Because, all work must happen withing a transaction, transactions tend to be on the long side.

2) Constant re-fetching of data every transaction makes keeping user interface elements up to date very expensive. There is no user level in-memory caching without writing user level code to create transient copies.

3) Schema migration is hard, if not impossible. Your object defines your format. Adding a field to a class makes your in-memory model inconsistent with the slabs of bits you wrote out before you added the field. There are ways around this. The usual one is to have one ivar that is a dictionary. Otherwise, there are usually some very user un-friendly scripts that have to be run. In many cases, the database must be taken offline to do this. So much for your three nines availability.

4) Death by a trillion bug fixes. I can’t speak for all, but ObjectStore would require the database be taken offline and an update script be run for every upgrade. For a site that is supposed to be up all the time, this isn’t acceptable. So upgrades were deferred. When we did this, we found that

5) OODBMS providers have limited resources and will only support versions up to one year old. If you get too far out of date and your db goes down, you are flat out of luck. The support people won’t help you. Only a really large organization could afford to keep up with all the little point fix releases ObjectStore made in a year – we couldn’t afford the man power or the down time.

6) Bugs are forever. If you put a bug into your program that damages the object model, it becomes enshrined in the database. Subsequent read code that finds the malformed chunk of the object model will usually fail. Subtle corruptions build up over time making a full database walk harder and harder to complete over time. Conventional databases can avoid this by implementing appropriate constraints.

7) No security. Any screwball developer can destroy your reference data (usually stored in ordered collections off of the root). A conventional relational database can safeguard important data with roles, permissions, and constraints.

8) Garbage Collection is not universally available. Orphaned junk is common. Some OODBs provide GC utilities, however they can fail if there is corrupt data (see items 6 and 7).

9) No ad hoc query capability. You have to write a new program to view any data at all. You need to write programs to update reference data. You need a program to do anything at all with your data. No fixing problems with a quick line of SQL. Searching for unanticipated patterns is difficult.

I’ve been bitten by all of these issues at one time or another and have recently inherited an application written using a Smalltalk OODB called OmniBase. Debugging this application is extremely painful because launching a debugger results in the transaction being terminated and all object references becoming invalid. Thus, the data that might provide a clue as to the source of the error is gone. Additionally, while the author claims to provide support, he simply collects fees and then tells you that your application doesn’t run in his environment, blames you for writing rotten code, and declines future contact.

So this dog has to go.

Fortunately, you can get most of the benefits of an OODB without the drawbacks by using an Object Relational Mapping framework. I’ve selected GLORP, an open source mapping framework that is improving all the time, and found that I can implement support the part of OmniBase’s API with very little change to the user interface, which is written in Seaside under Squeak.

Next time, I’ll talk a little bit about how this works.

Painters

Friday, October 20th, 2006

About three or four years ago, I was playing around with a concept I called Bricks. It was a way of factoring drawing operations out of UI components to make it easy to create new looks. The drawing operations were encapsulated into objects called Painters. I was doing it in Squeak using Morphic. It looked like this:

It must have been a good idea because I just ran across something similar from the Java Swing people.

So I’ve decided to dust it off and pick it up again. There have been a lot of changes in Morphic and I’ve given up on reworking the event deliver system, choosing to work with Morphic’s system, warts and all. I still think splitting drawing and layout will be valuable.

Sanskrit, Smalltalk, and Fireworks

Sunday, October 15th, 2006

There’s a combination I won’t soon forget. I spent yesterday evening at the Santa Cruz home of Dan Ingalls celebrating the 10th birthday of Squeak. Dan showed a video of a lecture he gave with his father on the development of an OCR system for recognizing Sanskrit text.

A number of very interesting people attended including some folks from Argentina who are marketing a Smalltalk based oil field production planning system. Also in attendance were David A. Smith – a leading architect of Croquet, Andreas Raab, Craig Latta who is doing some amazing work on a Smalltalk based system called Spoon, and a number of other thoroughly interesting people too numerous to mention here.

As a bonus, there was a fireworks display which we viewed from Dan’s terrace. It’s always interesting when a bunch of smalltalkers get together, they can be such an eclectic bunch.

Blanchard’s Law

Tuesday, September 5th, 2006

When in the course of developing software, the growing feeling that code generation would be a great idea actually means that the environment or language in which you are working isn’t powerful enough for the task at hand.

Years ago I was a C++ fan. It was the first language I encountered that had user definable classes and objects and we used it to build Motif GUI’s that fronted legacy character based systems. After awhile we found that the amount of boilerplate code we were writing was becoming an impediment to change and someone came up with the idea of specifying a number of aspects of the system as models and generating the code from the model.

As I was the lead of the business domain model, this approach was particularly attractive to me as we had dozens of classes that were all implemented using the same idioms, but whose set of attributes changed in minor ways with some frequency. Furthermore, we weren’t actually implementing the business logic within the classes as they were being mapped into an inference engine that provided enforcement of cross object validation rules. So code generation seemed like a great labor saving solution.

Problems occurred fairly soon after with generated code being occasionally hand modified by some maintenance programmer and changes being lost down the road. The generated code had size impacts as well. It’s easy to generate a program too large to compile and run on your current hardware. Our compile times for the system went from 2 to over 30 hours – a problem we solved by parallelizing the build process across a fleet of machines. The project was ultimately cancelled and we didn’t have a chance to see if the giant could fly. But I was left with some lingering doubts as to the efficacy of generating code.

Later on I encountered CORBA. CORBA seemed pretty cool – like Remote Procedure Calls (RPC) but a little easier to use. Interface Definition Language (IDL) looked just like C++ with a couple minor tweaks. You generated your stubs, filled in your code, and you were off. At least until you wanted to expand or change an interface. Maintaining all the generated source files got cumbersome and once again added considerably to the build times.

Java RMI was the same kind of thing – define Java interfaces (instead of IDL) and a whole bunch of marshalling code gets generated. But you end up with an awful lot of code in the end.

A bit later in my career, I encountered dynamic languages. I got a glimpse of HP’s DST (Distributed Smalltalk) which implemented CORBA using a single proxy object that could stand in for any object. NextStep’s Portable Distributed Objects (PDO) was a similar take on this theme. These smart proxies were made possible by the dynamic typing and message sending nature of these languages. Static languages were stuck with code generation because they had to satisfy the compiler’s type constraints.

Consequently, I view the growing trend towards code generation with some horror. No wonder our systems are insanely resource intensive while delivering very little value. After a decade and a half developing software, I have yet to encounter a situation where code generation is a good idea in the end. The answer isn’t to automate your inefficiency. If this is not possible, then it is time to evaluate different tools and approaches.