Archive for the ‘C++’ Category

Looking for work again

Monday, March 3rd, 2008

I’ve been doing a lot of PHP lately – shopping cart integration, web service clients, lots of ecommerce stuff. But I’m coming to the end of these projects and am looking for new challenges. Got a project or position that I can do remotely? Drop me a line.

Got Class?

Tuesday, October 31st, 2006

In Object Oriented languages, the class is the definition of a kind of object. Classes are arranged in tree-like hierachies with the roots of the tree being very general and the leaves being most specific. So a dessert is a kind of food, and cake is a kind of dessert. Because the class serves as the definition of an object, it makes sense that objects are created by using the class as a template. In other words, classes are used to create objects of the class’s type.

All of the languages discussed so far, C++, Java, Smalltalk, and Objective C have the notion of a class. What is different is how the languages represent the class itself.

C++ has no runtime representation of a class at all. Instead, all class information is soaked up by the compiler and represented using standard functions and globals with the class name prepended to their “local” names. The way a programmer designates a function as belonging to a class is to declare it within the class’s declaration using the very overworked keyword ’static’.

class CPlusPlusClass : public BaseObject
{
public:
static void classMember(); // declare the function

}

void CPlusPlusClass::classMember() {…} // define the function

CPlusPlusClass.classMember(); // really just a long name for a global function

Ditto for class variables. There’s no such thing in C++ really. Instead the language fakes it by letting you declare the variable in the class declaration’s namespace and in the end what you’ve got is a global with a very long name. Since there’s no class object, there’s no “this” variable and no way to do anything the least bit object oriented in class methods without explicitly naming super classes.

There’s also no built-in way to abstract object creation. When one wishes to create an object in C++, one uses the ‘new’ operator in conjunction with the class name, which coincidentally is also the required name of the initialization function. So in C++ we write:

BaseObject *c = new CPlusPlusClass();
Notice that here – at the moment of object creation, it is necessary to reveal the actual type of the object. If one desires a more abstract method of instance creation, it is necessary to adopt a factory idiom. Of course, converting to a factory creation mechanism from the use of operator new requires extensive modification of code, similar to the extensive work required to handle a new kind of exception.

Smalltalk takes the opposite extreme. Classes are represented as regular objects and are arranged into an inheritance hierarchy. The special variable “self” refers to the class object itself and ’super’ refers to the object that represents the classes superclass. Smalltalk class methods exhibit full polymorphism and class variables are made available to class methods and instance methods in the class’s instances. The Smalltalk class takes responsibility for instance creation and is thus a factory by default.

For instance, in the Squeak implementation, ImageReadWriter is an abstract class for reading and writing image files. Concrete subclasses exist for JPEG, GIF, PNG and so forth. The abstract base class makes use of class hierarchy navigation to find the correct subclass to render the image data. It looks something like this (error handling omitted):

“Find the first subclass that claims to be able to understand the binary stream’s data”

readerClass := self withAllSubclasses detect: [:subclass | binaryStream reset. (subclass new on: binaryStream) understandsImageFormat].

“Instantiate a reader from the class”
reader := readerClass new on: (binaryStream reset).

“Return the image from the reader”
^reader nextImage.

The beauty of this approach is that one can always write a new subclass of ImageReadWriter for a new image format and the new format is instantly supported without having to deal with registries or factory idioms.

Techniques like this demonstrate the power of using the object paradigm to represent classes.

Objective C uses a similar approach to Smalltalk with a couple of odd constraints. Objective C class objects lack support for class member variables. Instead one fakes it with static (as in C variables with internal linkage) variables. Also, Objective C performs lazy loading of libraries which means that not all subclasses are necessarily present at any given time. So the subclasses trick is a little hard to perform.

On the other hand, Objective C uses the same object creation mechanism as Smalltalk. So its still possible that the actual class of the object is not the same as the class that created it for you. Thus, the power of using objects to represent classes is mostly preserved.

Java’s approach is closer to C++ than to Smalltalk. While Java has ‘objects’ of type java.lang.Class for representing classes, these ‘objects’ have more in common with C structs than with objects in any other sense. Object creation is not performed by the class. Rather Java slavishly copies C++’s use of operator ‘new’.

Java also makes use of a ’static’ keyword to denote ‘class methods’. But these ‘methods’ are actually nothing but functions with long names. In fact, its not possible to do anything polymorphic in a class method. There isn’t even a way to get ahold of the class object representing the class.

public class A
{
// something that doesn’t work – no ‘this’
static void printClassName1() { System.out.println(”"+this.class); }

// this doesn’t work either
static void printClassName2() { System.out.println(”"+getClass()); }

// this does – but in subclasses it will be wrong
static void printClassName3() { System.out.println(”"+A.class); }
}

So crippling is the decision to not provide proper class objects, that the Enterprise Java Bean designers found it necessary to simulate class objects with the notion of “Home” objects. The “Home” object is intended to provide the ability to find existing objects or create new ones for a given class. This is definitely behavior that would ordinarily be put into a class object.

If we had one.

I Love You – Now Change

Monday, October 2nd, 2006

Which company would you prefer to invest in?

Company A:

“Our company is a dynamic organization with the agility to quickly adapt to new market conditions.”or

Company B:

“Our company is stable, well structured, and organized. What we are doing now is a perfect basis for everything we will do in the future.”(Sounds a little like the Bush administration)

The dynamic organization that can change quickly is going to be more successful than a static organization that is set in its ways.

So how come the software industry pundits continue to try to push static programming systems over dynamic ones when dynamic systems are generally more successful? Its senseless.

In C++ and Java, the assumption is that the superclass designer knows best. Whatever interface the original developer has exposed is expected to be the perfect and complete interface for all time. There is no way to extend an existing interface without owning the source code to it.

The implementation, it is recognized, might not be exactly correct in all cases, so the implementation is left open to extension via the one mechanism made available – subclassing (maybe – Java actually allows the most arrogant developers to forbid subclassing via the ‘final’ keyword).

The problem with leaving only subclassing is that subclassing, by itself only provides for extension of the system, not for modification. Fans of Robert C Martin or Bertrand Meyer might recognize this as The Open Closed Principle. Sadly, The Open Closed Principle is only works if you happen to work for a static company like Company B.

The harsh reality is that organizations are organic – they evolve and grow to adapt to new environment conditions. Failure to evolve is death. How can you modify your organization if the software that runs your organization is closed to modification by design? Worse, the underlying tools and technology on which your software is built actually work to enforce The Open Close Principle.

So how else to evolve your system? Objective C has a construct called a Method Category or more commonly just “category”. A category is a collection of methods for a class that may be loaded dynamically – or not. These are collections of additional methods to be added to existing classes. These additions may be made part of the organizations core software assets, or they can be application specific extensions that are too specialized for general consumption.

For instance, a web services application may find it convenient to add some methods to the string class for parsing up web requests, but the billing system doesn’t need this category of methods and so doesn’t bother to load it.

Categories can also make adapting an existing class to a new protocol easy. Not having a number of separate adaptor classes all over the place keeps the number of classes low and the conceptual size of the application architecture smaller.

Finally, categories can allow the user of a class to replace a buggy or inappropriately implemented method with a new implementation without having the source code.

Another useful tool is the ability to replace one class with another. The Objective C tool for this is known as “posing”. One writes a subclass of the original class and then says

[NewClass poseAs: [OldClass class]];

Now saying [OldClass new] actually constructs an instance of NewClass. This can be handy for sneaking superclasses into the class hierarchy and also for debugging around code you don’t own.

Using these techniques, along with message forwarding and delegation, subclassing takes a back seat to application assembly and drops from the most used tool to the mechanism of last resort. After all, its much better to simply arrange the classes you already have into the right structure than to create entirely new code with entirely new bugs.

Smalltalk has similar mechanisms and results in similar designs. Method categories exist and can be loaded as packages. Posing is done quite easily by replacing a class in the Smalltalk class dictionary with another class and executing a become: on all of the old classes instances. Its easy to insert new classes anywhere in the hierarchy and all of the code is easily accessible and modifiable. Subclassing in Smalltalk application development is a relatively rare event.

Of course, if you’re sure you know what you’re doing – perhaps Java and C++ are the right languages. If you’re sure, that is.

Relax, It’s Nothing!

Tuesday, September 12th, 2006

Originally published August, 14, 2001

What’s the most frequently seen message produced by Java programs?

Its got to be the NullPointerException. In efforts to avoid this dreaded message, programmers have adopted an idiom that looks something like:

if(object != null) object.doSomething();

Which basically means – only send the message if there’s something there to receive it. Thats how you keep the exception from being thrown. The idiom must be burdensome as NullPointerExceptions continue to pop up at unexpected times and the fix is invariably to put this sort of test on the line from which the exception is raised.

This is an idiom adapted from C and later C++ where null is memory address zero by convention. Since important chunks of the operating system live down in that memory region, modern operating systems protect that region from fiddling and view trying to reference anything off of memory address zero as a likely programmer error. So its not allowed in the name of protecting the operating system.

C and C++ programmers, in efforts to get their programs to live long enough to let the program report whats going on, check pointers to make sure that they aren’t null before using them. This is because the penalty for using a null pointer in these environments is death.

These days, applications are developed using higher level languages, like Java and Smalltalk. These have no pointers at all. Instead we have object references and its not possible for the programmer to endanger the operating system by messaging a null object reference. (Objective C uses pointers as object references, but the programmer never directly dereferences them – its all done by the Objective C runtime).

So now, released from the need to steer clear of the operating system’s defense mechanisms, we can take a step back and think about what sending a message to nothing means.

Shouting across the street to someone who isn’t there may make you feel silly – but nothing really happens. Same for sending a letter to a non-existent address. If there’s no return address, the post office will eventually give up and toss it. Nothing too terrible there. There’s simply no one to receive the message.

So why does Java take the position that messaging null is so great a catastrophe that the programmer must be burdened with handling an exception? Especially when most of the time, the programmer’s method of avoiding the unwanted exception is to add a test for null before sending the message. In other words, the programmer will fix it with:

if(object != null) object.doSomething();

which is a just a way of saying “if there’s noone to receive the message – do nothing”. Worse, the programmer has to add this check to every single location in the code that tries to make use of the object reference.

This is the same nuisance condition we identified with typing of object references. Recall that the dynamic languages provided a means of handling this condition in a central location via the doesNotUnderstand, while the Java version required the programmer to handle it at each call site.

A similar situation exists with the use of null in Java. It must be checked for and handled at each call site. There must be a better way.

On the dynamic side of the world things are simpler. In Smalltalk, null (actually, in Smalltalk its referred to as ‘nil’) is a global singleton object of the class UndefinedObject. It implements hardly any messages and so messaging nil results in nil receiving doesNotUnderstand. The default behavior of doesNotUnderstand in nil is to halt the program and throw a debugger around it. Many deployed systems change this behavior on deployment to log the behavior and stack trace, or to simply return nil.

In Objective C, messaging nil results in nil being returned by default. This behavior can be changed in the runtime by adding a hook function to the runtime that could do logging, or raise an exception.

In either case, the consequences of messaging nil are under the control of the programmer and can range from totally benign to fatal, depending on the developer’s preference and the application domain. Experience with developing applications in this environment has shown that messaging nil is nearly always harmless, and not having to place tests for nil before every message send results in smaller, cleaner, and easier to understand code.

Plus, the applications don’t crash nearly as often. This is yet another example of how a feature in Java that is intended to improve software reliability, actually undermines it.

Blanchard’s Law

Tuesday, September 5th, 2006

When in the course of developing software, the growing feeling that code generation would be a great idea actually means that the environment or language in which you are working isn’t powerful enough for the task at hand.

Years ago I was a C++ fan. It was the first language I encountered that had user definable classes and objects and we used it to build Motif GUI’s that fronted legacy character based systems. After awhile we found that the amount of boilerplate code we were writing was becoming an impediment to change and someone came up with the idea of specifying a number of aspects of the system as models and generating the code from the model.

As I was the lead of the business domain model, this approach was particularly attractive to me as we had dozens of classes that were all implemented using the same idioms, but whose set of attributes changed in minor ways with some frequency. Furthermore, we weren’t actually implementing the business logic within the classes as they were being mapped into an inference engine that provided enforcement of cross object validation rules. So code generation seemed like a great labor saving solution.

Problems occurred fairly soon after with generated code being occasionally hand modified by some maintenance programmer and changes being lost down the road. The generated code had size impacts as well. It’s easy to generate a program too large to compile and run on your current hardware. Our compile times for the system went from 2 to over 30 hours – a problem we solved by parallelizing the build process across a fleet of machines. The project was ultimately cancelled and we didn’t have a chance to see if the giant could fly. But I was left with some lingering doubts as to the efficacy of generating code.

Later on I encountered CORBA. CORBA seemed pretty cool – like Remote Procedure Calls (RPC) but a little easier to use. Interface Definition Language (IDL) looked just like C++ with a couple minor tweaks. You generated your stubs, filled in your code, and you were off. At least until you wanted to expand or change an interface. Maintaining all the generated source files got cumbersome and once again added considerably to the build times.

Java RMI was the same kind of thing – define Java interfaces (instead of IDL) and a whole bunch of marshalling code gets generated. But you end up with an awful lot of code in the end.

A bit later in my career, I encountered dynamic languages. I got a glimpse of HP’s DST (Distributed Smalltalk) which implemented CORBA using a single proxy object that could stand in for any object. NextStep’s Portable Distributed Objects (PDO) was a similar take on this theme. These smart proxies were made possible by the dynamic typing and message sending nature of these languages. Static languages were stuck with code generation because they had to satisfy the compiler’s type constraints.

Consequently, I view the growing trend towards code generation with some horror. No wonder our systems are insanely resource intensive while delivering very little value. After a decade and a half developing software, I have yet to encounter a situation where code generation is a good idea in the end. The answer isn’t to automate your inefficiency. If this is not possible, then it is time to evaluate different tools and approaches.

Function Calling vs Message Sending

Thursday, August 24th, 2006

“What happens if your reference refers to a different type of object than you expect?”

Its probably a programming error.

Or not.

In C++, because of the way vtable dispatching works, the program will either crash (hopefully) or the wrong member function will be called. Either way – the end result is probably fatal.

Java is slightly better. In order to call a member function, the java compiler checks to make sure that the function is defined in one of the interfaces implemented by the type of the object reference. The problem arises when the object reference has been widened to a more general type that does not define the desired member function.

How can this happen?

public class A extends Object { public void anAThing() {} … }

A myA = new A(); // make an A
someList.add(myA); // put it in a list … someList.get(0).anAThing; // error – List.get(int) returns Object.

so we have to downcast the Object reference returned by someList.get() to an A reference

((A)someList.get(0)).anAThing(); // Might be OK if someList has an A

Casting is completely type unsafe. Although in Java the result of incorrect casting is an exception. So we could write:

try { ((A)someList.get(0)).anAThing(); }
catch(ClassCastException ex) true

which allows us to handle the casting error and continue. This is a huge step up from the C++ behavior of crashing in that it allows the programmer some control over what to do if the cast is wrong. If we desire a more conventional means of writing this, it is possible to use the instanceof operator and do the check before the cast.

if(someList.get(0) instanceof A) ((A)someList.get(0)).anAThing();

which is pretty much the same thing. Of course, this forces programmers to clutter their code with all sorts of tests or try/catch blocks. It seems to me that getting away from that sort of thing was precisely the reason everybody wanted to switch to object oriented programming in the first place. Polymorphism replaced an awful lot of if ladders and switch statements and here we are putting them back in to work around a runtime system that throws an exception if we guess incorrectly about the type of an object.

Not to mention the idea that its quite possible to have this situation:

public class A extends Object { public void doAThing(); …}
public class B extends Object { public void doAThing(); …}

Object myB = new B();
((B)myB).doAThing(); // fine
((A)myB).doAThing(); // error – object is not an A!

which seems just silly. We have an object that we are pretty sure implements the operation doAThing – but that’s not good enough. We have to know exactly which interface this particular object implements in order to call that method. Thus, type defines protocol in the statically typed world.

The problem is that such an arrangement assumes the existence of what Bart Kosko calls crisp sets and hierarchies. The world isn’t nearly so neat. Its fuzzy. B might well be capable of performing some of A’s operations and in some (but perhaps not all) circumstances, B might be an excellent stand-in for an A. Its a more accurate model. To quote Kosko again. “fuzz up – precision up”.

OK, so what about the dynamically typed languages? How are they better? Assume we have the same classes A and B derived from Object and each implements doAThing.

| anA |

anA := A new.
anA doAThing.

anA := B new.
anA doAThing.

anA := ‘this is a string’.
anA doAThing.

This all works except for the last line when anA refers to a String rather than an instance of A or B. String definitely doesn’t implement doAThing. So what happens?

First, it helps to know that the runtime systems for Smalltalk and Objective C are “message sending” rather than “function calling”. When the programmer tries to send a message to an object that doesn’t respond to that message, the runtime packages the message up as an object and calls a catch-all message instead. In Smalltalk this is usually called doesNotUnderstand: message.

In Objective C, a special message called forwardInvocation: is called to give the programmer a chance to send the message to some other object such as a delegate. The default implementation of forwardInvocation: doesn’t do any forwarding. Instead it just calls another method doesNotRespondToSelector: which raises an exception.

The programmer may choose to respond to these messages in a class specific way – polymorphically, by overriding forwardInvocation: or doesNotUnderstand:

Doing this moves the error handling to a central location rather than forcing the programmer to scatter it throughout the code at the call locations. The end result is cleaner, smaller code.

Plus there’s a bonus. Being able to forward messages provides a clean mechanism for building chain of command patterns and allows an object to be “decorated” with new behaviors dynamically.

Its also easy to do distributed computing by having the forwardInvocation method perform remote procedure calls over the network without the need to do clumsy code generation of proxies and stubs common in C, CORBA, and Java RMI programs. A single proxy class can stand in for any kind of object.

The doesNotUnderstand: message can also provide a trigger for database fetching and object faulting. When a message is sent to a simple database query object that implements almost no messages, doesNotUnderstand: is invoked, the database query is executed, and the object replaced with the results of the fetch. The message is then delivered to the newly fetched object. Such faulting mechanisms can simplify programming and virtually eliminate the need for application programmers to directly interact with a database API.

These extra capabilities are nearly impossible to implement in the statically typed environments and this is clearly a case when the dynamically typed environment yields simpler application code (we don’t have all those try/catch blocks or instanceof tests). Simpler application code means greater reliability with reduced programmer effort. This all translates to faster development times and lower costs.