Archive for September, 2006

Relax, It’s Nothing!

Tuesday, September 12th, 2006

Originally published August, 14, 2001

What’s the most frequently seen message produced by Java programs?

Its got to be the NullPointerException. In efforts to avoid this dreaded message, programmers have adopted an idiom that looks something like:

if(object != null) object.doSomething();

Which basically means – only send the message if there’s something there to receive it. Thats how you keep the exception from being thrown. The idiom must be burdensome as NullPointerExceptions continue to pop up at unexpected times and the fix is invariably to put this sort of test on the line from which the exception is raised.

This is an idiom adapted from C and later C++ where null is memory address zero by convention. Since important chunks of the operating system live down in that memory region, modern operating systems protect that region from fiddling and view trying to reference anything off of memory address zero as a likely programmer error. So its not allowed in the name of protecting the operating system.

C and C++ programmers, in efforts to get their programs to live long enough to let the program report whats going on, check pointers to make sure that they aren’t null before using them. This is because the penalty for using a null pointer in these environments is death.

These days, applications are developed using higher level languages, like Java and Smalltalk. These have no pointers at all. Instead we have object references and its not possible for the programmer to endanger the operating system by messaging a null object reference. (Objective C uses pointers as object references, but the programmer never directly dereferences them – its all done by the Objective C runtime).

So now, released from the need to steer clear of the operating system’s defense mechanisms, we can take a step back and think about what sending a message to nothing means.

Shouting across the street to someone who isn’t there may make you feel silly – but nothing really happens. Same for sending a letter to a non-existent address. If there’s no return address, the post office will eventually give up and toss it. Nothing too terrible there. There’s simply no one to receive the message.

So why does Java take the position that messaging null is so great a catastrophe that the programmer must be burdened with handling an exception? Especially when most of the time, the programmer’s method of avoiding the unwanted exception is to add a test for null before sending the message. In other words, the programmer will fix it with:

if(object != null) object.doSomething();

which is a just a way of saying “if there’s noone to receive the message – do nothing”. Worse, the programmer has to add this check to every single location in the code that tries to make use of the object reference.

This is the same nuisance condition we identified with typing of object references. Recall that the dynamic languages provided a means of handling this condition in a central location via the doesNotUnderstand, while the Java version required the programmer to handle it at each call site.

A similar situation exists with the use of null in Java. It must be checked for and handled at each call site. There must be a better way.

On the dynamic side of the world things are simpler. In Smalltalk, null (actually, in Smalltalk its referred to as ‘nil’) is a global singleton object of the class UndefinedObject. It implements hardly any messages and so messaging nil results in nil receiving doesNotUnderstand. The default behavior of doesNotUnderstand in nil is to halt the program and throw a debugger around it. Many deployed systems change this behavior on deployment to log the behavior and stack trace, or to simply return nil.

In Objective C, messaging nil results in nil being returned by default. This behavior can be changed in the runtime by adding a hook function to the runtime that could do logging, or raise an exception.

In either case, the consequences of messaging nil are under the control of the programmer and can range from totally benign to fatal, depending on the developer’s preference and the application domain. Experience with developing applications in this environment has shown that messaging nil is nearly always harmless, and not having to place tests for nil before every message send results in smaller, cleaner, and easier to understand code.

Plus, the applications don’t crash nearly as often. This is yet another example of how a feature in Java that is intended to improve software reliability, actually undermines it.

Blanchard’s Law

Tuesday, September 5th, 2006

When in the course of developing software, the growing feeling that code generation would be a great idea actually means that the environment or language in which you are working isn’t powerful enough for the task at hand.

Years ago I was a C++ fan. It was the first language I encountered that had user definable classes and objects and we used it to build Motif GUI’s that fronted legacy character based systems. After awhile we found that the amount of boilerplate code we were writing was becoming an impediment to change and someone came up with the idea of specifying a number of aspects of the system as models and generating the code from the model.

As I was the lead of the business domain model, this approach was particularly attractive to me as we had dozens of classes that were all implemented using the same idioms, but whose set of attributes changed in minor ways with some frequency. Furthermore, we weren’t actually implementing the business logic within the classes as they were being mapped into an inference engine that provided enforcement of cross object validation rules. So code generation seemed like a great labor saving solution.

Problems occurred fairly soon after with generated code being occasionally hand modified by some maintenance programmer and changes being lost down the road. The generated code had size impacts as well. It’s easy to generate a program too large to compile and run on your current hardware. Our compile times for the system went from 2 to over 30 hours – a problem we solved by parallelizing the build process across a fleet of machines. The project was ultimately cancelled and we didn’t have a chance to see if the giant could fly. But I was left with some lingering doubts as to the efficacy of generating code.

Later on I encountered CORBA. CORBA seemed pretty cool – like Remote Procedure Calls (RPC) but a little easier to use. Interface Definition Language (IDL) looked just like C++ with a couple minor tweaks. You generated your stubs, filled in your code, and you were off. At least until you wanted to expand or change an interface. Maintaining all the generated source files got cumbersome and once again added considerably to the build times.

Java RMI was the same kind of thing – define Java interfaces (instead of IDL) and a whole bunch of marshalling code gets generated. But you end up with an awful lot of code in the end.

A bit later in my career, I encountered dynamic languages. I got a glimpse of HP’s DST (Distributed Smalltalk) which implemented CORBA using a single proxy object that could stand in for any object. NextStep’s Portable Distributed Objects (PDO) was a similar take on this theme. These smart proxies were made possible by the dynamic typing and message sending nature of these languages. Static languages were stuck with code generation because they had to satisfy the compiler’s type constraints.

Consequently, I view the growing trend towards code generation with some horror. No wonder our systems are insanely resource intensive while delivering very little value. After a decade and a half developing software, I have yet to encounter a situation where code generation is a good idea in the end. The answer isn’t to automate your inefficiency. If this is not possible, then it is time to evaluate different tools and approaches.