Archive for the ‘smalltalk’ Category

Sanskrit, Smalltalk, and Fireworks

Sunday, October 15th, 2006

There’s a combination I won’t soon forget. I spent yesterday evening at the Santa Cruz home of Dan Ingalls celebrating the 10th birthday of Squeak. Dan showed a video of a lecture he gave with his father on the development of an OCR system for recognizing Sanskrit text.

A number of very interesting people attended including some folks from Argentina who are marketing a Smalltalk based oil field production planning system. Also in attendance were David A. Smith – a leading architect of Croquet, Andreas Raab, Craig Latta who is doing some amazing work on a Smalltalk based system called Spoon, and a number of other thoroughly interesting people too numerous to mention here.

As a bonus, there was a fireworks display which we viewed from Dan’s terrace. It’s always interesting when a bunch of smalltalkers get together, they can be such an eclectic bunch.

I Love You – Now Change

Monday, October 2nd, 2006

Which company would you prefer to invest in?

Company A:

“Our company is a dynamic organization with the agility to quickly adapt to new market conditions.”or

Company B:

“Our company is stable, well structured, and organized. What we are doing now is a perfect basis for everything we will do in the future.”(Sounds a little like the Bush administration)

The dynamic organization that can change quickly is going to be more successful than a static organization that is set in its ways.

So how come the software industry pundits continue to try to push static programming systems over dynamic ones when dynamic systems are generally more successful? Its senseless.

In C++ and Java, the assumption is that the superclass designer knows best. Whatever interface the original developer has exposed is expected to be the perfect and complete interface for all time. There is no way to extend an existing interface without owning the source code to it.

The implementation, it is recognized, might not be exactly correct in all cases, so the implementation is left open to extension via the one mechanism made available – subclassing (maybe – Java actually allows the most arrogant developers to forbid subclassing via the ‘final’ keyword).

The problem with leaving only subclassing is that subclassing, by itself only provides for extension of the system, not for modification. Fans of Robert C Martin or Bertrand Meyer might recognize this as The Open Closed Principle. Sadly, The Open Closed Principle is only works if you happen to work for a static company like Company B.

The harsh reality is that organizations are organic – they evolve and grow to adapt to new environment conditions. Failure to evolve is death. How can you modify your organization if the software that runs your organization is closed to modification by design? Worse, the underlying tools and technology on which your software is built actually work to enforce The Open Close Principle.

So how else to evolve your system? Objective C has a construct called a Method Category or more commonly just “category”. A category is a collection of methods for a class that may be loaded dynamically – or not. These are collections of additional methods to be added to existing classes. These additions may be made part of the organizations core software assets, or they can be application specific extensions that are too specialized for general consumption.

For instance, a web services application may find it convenient to add some methods to the string class for parsing up web requests, but the billing system doesn’t need this category of methods and so doesn’t bother to load it.

Categories can also make adapting an existing class to a new protocol easy. Not having a number of separate adaptor classes all over the place keeps the number of classes low and the conceptual size of the application architecture smaller.

Finally, categories can allow the user of a class to replace a buggy or inappropriately implemented method with a new implementation without having the source code.

Another useful tool is the ability to replace one class with another. The Objective C tool for this is known as “posing”. One writes a subclass of the original class and then says

[NewClass poseAs: [OldClass class]];

Now saying [OldClass new] actually constructs an instance of NewClass. This can be handy for sneaking superclasses into the class hierarchy and also for debugging around code you don’t own.

Using these techniques, along with message forwarding and delegation, subclassing takes a back seat to application assembly and drops from the most used tool to the mechanism of last resort. After all, its much better to simply arrange the classes you already have into the right structure than to create entirely new code with entirely new bugs.

Smalltalk has similar mechanisms and results in similar designs. Method categories exist and can be loaded as packages. Posing is done quite easily by replacing a class in the Smalltalk class dictionary with another class and executing a become: on all of the old classes instances. Its easy to insert new classes anywhere in the hierarchy and all of the code is easily accessible and modifiable. Subclassing in Smalltalk application development is a relatively rare event.

Of course, if you’re sure you know what you’re doing – perhaps Java and C++ are the right languages. If you’re sure, that is.

Blanchard’s Law

Tuesday, September 5th, 2006

When in the course of developing software, the growing feeling that code generation would be a great idea actually means that the environment or language in which you are working isn’t powerful enough for the task at hand.

Years ago I was a C++ fan. It was the first language I encountered that had user definable classes and objects and we used it to build Motif GUI’s that fronted legacy character based systems. After awhile we found that the amount of boilerplate code we were writing was becoming an impediment to change and someone came up with the idea of specifying a number of aspects of the system as models and generating the code from the model.

As I was the lead of the business domain model, this approach was particularly attractive to me as we had dozens of classes that were all implemented using the same idioms, but whose set of attributes changed in minor ways with some frequency. Furthermore, we weren’t actually implementing the business logic within the classes as they were being mapped into an inference engine that provided enforcement of cross object validation rules. So code generation seemed like a great labor saving solution.

Problems occurred fairly soon after with generated code being occasionally hand modified by some maintenance programmer and changes being lost down the road. The generated code had size impacts as well. It’s easy to generate a program too large to compile and run on your current hardware. Our compile times for the system went from 2 to over 30 hours – a problem we solved by parallelizing the build process across a fleet of machines. The project was ultimately cancelled and we didn’t have a chance to see if the giant could fly. But I was left with some lingering doubts as to the efficacy of generating code.

Later on I encountered CORBA. CORBA seemed pretty cool – like Remote Procedure Calls (RPC) but a little easier to use. Interface Definition Language (IDL) looked just like C++ with a couple minor tweaks. You generated your stubs, filled in your code, and you were off. At least until you wanted to expand or change an interface. Maintaining all the generated source files got cumbersome and once again added considerably to the build times.

Java RMI was the same kind of thing – define Java interfaces (instead of IDL) and a whole bunch of marshalling code gets generated. But you end up with an awful lot of code in the end.

A bit later in my career, I encountered dynamic languages. I got a glimpse of HP’s DST (Distributed Smalltalk) which implemented CORBA using a single proxy object that could stand in for any object. NextStep’s Portable Distributed Objects (PDO) was a similar take on this theme. These smart proxies were made possible by the dynamic typing and message sending nature of these languages. Static languages were stuck with code generation because they had to satisfy the compiler’s type constraints.

Consequently, I view the growing trend towards code generation with some horror. No wonder our systems are insanely resource intensive while delivering very little value. After a decade and a half developing software, I have yet to encounter a situation where code generation is a good idea in the end. The answer isn’t to automate your inefficiency. If this is not possible, then it is time to evaluate different tools and approaches.

Closures for Java?

Thursday, August 24th, 2006

Gilad Bracha of Strongtalk fame has a post talking about the proposed introduction of closures to Java.

I consider Java to be so broken at this point that I view it as “lipstick for a pig” and agree with Bracha’s position that its unfortunate that they were delayed to the point that all the common idioms that might have been done with closures have been established using less elegant techniques. In other words, I think its too late.

Its a chain of constraints produced by a couple of root decisions. First was the insistence on mandatory manifest typing. The Strongtalk system and Objective C language both support optional manifest typing. Objective C does it to provide compile time warnings for programmers to help them catch mistakes. Strongtalk actually uses the type information to call type optimized versions of operations. This technique is part of the foundation of the HotSpot JVM.

The second design problem is Java’s choice of C style function calling semantics vs Smalltalk’s message sending. Message sending is the more flexible and uniform technique. You have an entity at some distance, you send it a message asking it to do something. If it knows what you are asking, it does it and replies with your result. If it doesn’t know what to do, some default action occurs. This is how network available services perform and is also how Smalltalk objects interact.

Java, C++, C take the position that the compiler has assured that the receiver is of a type that MUST understand the message you are sending. They do this by constraining the messages you are allowed to send at compile time. This is meant to prevent errors but it also needlessly limits program flexibility and forces the developer to deal with a second interaction model – one for network resources and another for local ones. The Smalltalker has only one interaction model to contend with.

This is important and profound. In fact, Alan Kay, the man who coined the term “Object Oriented” laments the choice of words saying: “Object-oriented programming is about messages, not the objects. We worry about the objects, but it’s the messages that matter.”

But Java is function-call oriented. Consequently, the closures end up looking like local function declarations – which means they are awkward and ugly to declare and use inline and they are littered with extraneous type information. And they will most likely not catch on.

Java is what it is. Like a mutant frog it is trying to make it to land, asymptotically approaching Smalltalk, a language first released in 1980, and yet, its clear that it can’t get there and I suspect it will die trying. At least, I hope it will.

Function Calling vs Message Sending

Thursday, August 24th, 2006

“What happens if your reference refers to a different type of object than you expect?”

Its probably a programming error.

Or not.

In C++, because of the way vtable dispatching works, the program will either crash (hopefully) or the wrong member function will be called. Either way – the end result is probably fatal.

Java is slightly better. In order to call a member function, the java compiler checks to make sure that the function is defined in one of the interfaces implemented by the type of the object reference. The problem arises when the object reference has been widened to a more general type that does not define the desired member function.

How can this happen?

public class A extends Object { public void anAThing() {} … }

A myA = new A(); // make an A
someList.add(myA); // put it in a list … someList.get(0).anAThing; // error – List.get(int) returns Object.

so we have to downcast the Object reference returned by someList.get() to an A reference

((A)someList.get(0)).anAThing(); // Might be OK if someList has an A

Casting is completely type unsafe. Although in Java the result of incorrect casting is an exception. So we could write:

try { ((A)someList.get(0)).anAThing(); }
catch(ClassCastException ex) true

which allows us to handle the casting error and continue. This is a huge step up from the C++ behavior of crashing in that it allows the programmer some control over what to do if the cast is wrong. If we desire a more conventional means of writing this, it is possible to use the instanceof operator and do the check before the cast.

if(someList.get(0) instanceof A) ((A)someList.get(0)).anAThing();

which is pretty much the same thing. Of course, this forces programmers to clutter their code with all sorts of tests or try/catch blocks. It seems to me that getting away from that sort of thing was precisely the reason everybody wanted to switch to object oriented programming in the first place. Polymorphism replaced an awful lot of if ladders and switch statements and here we are putting them back in to work around a runtime system that throws an exception if we guess incorrectly about the type of an object.

Not to mention the idea that its quite possible to have this situation:

public class A extends Object { public void doAThing(); …}
public class B extends Object { public void doAThing(); …}

Object myB = new B();
((B)myB).doAThing(); // fine
((A)myB).doAThing(); // error – object is not an A!

which seems just silly. We have an object that we are pretty sure implements the operation doAThing – but that’s not good enough. We have to know exactly which interface this particular object implements in order to call that method. Thus, type defines protocol in the statically typed world.

The problem is that such an arrangement assumes the existence of what Bart Kosko calls crisp sets and hierarchies. The world isn’t nearly so neat. Its fuzzy. B might well be capable of performing some of A’s operations and in some (but perhaps not all) circumstances, B might be an excellent stand-in for an A. Its a more accurate model. To quote Kosko again. “fuzz up – precision up”.

OK, so what about the dynamically typed languages? How are they better? Assume we have the same classes A and B derived from Object and each implements doAThing.

| anA |

anA := A new.
anA doAThing.

anA := B new.
anA doAThing.

anA := ‘this is a string’.
anA doAThing.

This all works except for the last line when anA refers to a String rather than an instance of A or B. String definitely doesn’t implement doAThing. So what happens?

First, it helps to know that the runtime systems for Smalltalk and Objective C are “message sending” rather than “function calling”. When the programmer tries to send a message to an object that doesn’t respond to that message, the runtime packages the message up as an object and calls a catch-all message instead. In Smalltalk this is usually called doesNotUnderstand: message.

In Objective C, a special message called forwardInvocation: is called to give the programmer a chance to send the message to some other object such as a delegate. The default implementation of forwardInvocation: doesn’t do any forwarding. Instead it just calls another method doesNotRespondToSelector: which raises an exception.

The programmer may choose to respond to these messages in a class specific way – polymorphically, by overriding forwardInvocation: or doesNotUnderstand:

Doing this moves the error handling to a central location rather than forcing the programmer to scatter it throughout the code at the call locations. The end result is cleaner, smaller code.

Plus there’s a bonus. Being able to forward messages provides a clean mechanism for building chain of command patterns and allows an object to be “decorated” with new behaviors dynamically.

Its also easy to do distributed computing by having the forwardInvocation method perform remote procedure calls over the network without the need to do clumsy code generation of proxies and stubs common in C, CORBA, and Java RMI programs. A single proxy class can stand in for any kind of object.

The doesNotUnderstand: message can also provide a trigger for database fetching and object faulting. When a message is sent to a simple database query object that implements almost no messages, doesNotUnderstand: is invoked, the database query is executed, and the object replaced with the results of the fetch. The message is then delivered to the newly fetched object. Such faulting mechanisms can simplify programming and virtually eliminate the need for application programmers to directly interact with a database API.

These extra capabilities are nearly impossible to implement in the statically typed environments and this is clearly a case when the dynamically typed environment yields simpler application code (we don’t have all those try/catch blocks or instanceof tests). Simpler application code means greater reliability with reduced programmer effort. This all translates to faster development times and lower costs.

Wikis – The new generation.

Wednesday, August 16th, 2006

Scoble is investigating collaborative tools – primarily chat and wiki tools. Seaside and Squeak are powering some really cool new capabilities. Like LogoWiki, a wiki that allows people to embed executable Logo programs. Useful for making educational sites about geometry and introductory programming. LogoWiki is built upon Pier, a wiki so rich in features that it approaches the level of a content management system. Given that the wiki was invented by a Smalltalker, this seems like a return to wiki’s roots.

BadPage.info has a new purpose?

Wednesday, May 17th, 2006

Scoble claims that the MS Word team finally generates clean html. I’ll believe it when it passes checks at http://badpage.info with no warnings or errors.

It will be nice for people to actually care about web standards.  The web 2.0 people have to because bad dom’s leads to broken javascript.  But so many large websites are still awful.

One more day

Wednesday, April 5th, 2006

and I’ll have my job automated to the point where I can do it in 15 minutes a week. Thanks to Squeak, Seaside, a bunch of code I have laying around, and good old fashioned focused laziness, I expect I’ll have my current function fully performed by a little web app and 15 minutes of manual updating.

You can’t touch this kind of thing in Java.

Why I hate Microsoft Software – part I

Monday, April 3rd, 2006

At work we have a bunch of web pages that display information in tabular format. I have a project that needs to track a bunch of changes to the data in the database displayed in web pages in tabular format. I manage projects in Excel because MS Project is impossibly complex for the average person and Project X isn’t ready yet.

(When it ships, Project X will rock – it uses ObjectiveCLIPS which means it will be easy to hack its behavior and, more importantly, will act logically in the first place. But I digress)

Anyhow, I need to update my spreadsheet copy of this tabular data from the web page every week to make status reports. It seems to me that the sensible thing would be for html tables I select and copy to be put onto the clipboard in some format that Excel understands means TABLE. That would be the sensible thing. Which is likely why it doesn’t happen. Pasting a copied HTML table into Excel results in all the text being concatenated into a single field as one long string. What idiot thought that was the right thing to do? So much for application integration. They own the entire thing and they can’t make it work sensibly.

Instead, I had to rely on Squeak and the HTML parser I built to do http://www.badpage.info’s validator to extract the data into a tabular format (I used an array of dictionaries) that I could then use to output a csv file (comma separated values) that Excel will recognize.

Which resulted in me writing an entire app in Seaside and pitching Excel altogether. I mean, if I’ve got to write code anyhow, hey.

I do think this is one of those tasks that DabbleDB was made for. Pity its not out yet.

Intel native Squeak VM for Mac OS X

Sunday, April 2nd, 2006

I’ve finally figured out how to build an intel version of the Squeak VM for Unix. Thanks to some generous help fromJohn M McIntosh I’ve managed to get the Unix VM running faster than usual. I stick to the Unix VM to avoid any compatibility surprises as I develop on my Mac, but deploy to a Linux box in a hosting facility. The Unix VM has some advantages for interfacing with command line tools as well. If you’d like a copy, go ahead and grab it.

On the other hand, if all you are doing is playing on the Mac, then the Carbon based VM that John maintains is about 20% faster. It is also a universal binary. The Unix build I provide is Intel only.