Thursday, January 18, 2007

MySpace is Show Biz

Great article by David F. Carr on the behind the scenes mayhem at MySpace, whose exponential growth from a few hundred thousand users to 30 million users overloaded the system.

"MySpace started small, with two Web servers talking to a single database server. Originally, they were 2-processor Dell servers loaded with 4 gigabytes of memory"

Again and again MySpace was re-written to add: replicated databases, SANs, ASP.Net, and 64-bit servers. One interesting lesson was the addition of caching

To further lighten the burden on its storage systems when it reached 17 million accounts, in the spring of 2005 MySpace added a caching tier—a layer of servers placed between the Web servers and the database servers whose sole job was to capture copies of frequently accessed data objects in memory and serve them to the Web application without the need for a database lookup.

This sounds a lot like what GigaSpaces is offering. It's a LindaSpace-like type of middleware that replaces a messaging paradigm with a associative shared memory approach. Intense caching and heuristics to move data closer to its clients.

Scott Rosenberg makes the point that Web 2.0 successes may look easy from the outside, but behind the scenes there are huge challenges. Like Show Biz!

Friday, January 12, 2007

Why to use code generation techniques

Mark Baker comments on the problems with (xml) document validation. Any DTD that is too detailed is a time-limited contract. ProductType may be "1", "2", or "3" today. But a year from now types "4" and "5" may be allowed.

True, but validating, or what used to be called "laundering your input", still has to be done. Whether it's in the gatekeeper or the business object, the code has to be there.
But that’s really no way to write long-lived software, is it?
Yes, it's fine actually. Who cares where the error message "producttype '4' not allowed" comes from. All that matters is that at any instant in time it either processes valid messages or returns a suitable error message.

If splitting validation code from the business logic represents a maintainability problem, then consider code generation. Frameworks like StringTemplate make code generation easy. CodeGen tackles the common problem that a feature requires changes up and down the software stack. The traditional way to do this was manually make all the changes and rely on comments and checklists to ensure all the required changes are done. With CodeGen you can use a more literate programming style. Define in one place, generate various bits of code, and inject it into the appropriate places in your code. Patterns such as Chain Of Responsibility help.

The Beauty of Unreal

Further to my recent post on example-centric programming, Tony responded with
I entirely agree with the audience targeting critique you make. I was thinking about the historical roots of programming, the social aspects, and how they must be considered, not just for fuzzy artsy reasons but for because they are part of the sociology of skills and attitudes that mold the field (we are the medium of programming).
I think programming has strong roots in telephony and in code-breaking and
in mathematics. These roots influence the need for interfacing at the hardware level,
efficiency and high abstraction.

The PC revolution is in a way an extension of the business view of data processing, which was very much separate from the scientific programming done on mini-computers. In fact now that I think of it, this may be why DEC missed the boat on PC's - PC's are for business and now are appliances, not really computers - but data processors (ironic given the PDP name - there's a story there...).

So what this guy is advocating is a reconciliation of these two worlds, a wrapper view. Not sure that can fly in social terms, for the reasons you identify, and because we have two worlds in software, the programming in the old telephony-coding=math sense and the data processing view with forms, HTML, windows and high-level object libraries.

Alan Kay's deep thoughts on similar topics are here:

I hadn't really thought about the two ends of programming like that before. I guess there are two cultures. Part of my confusion is from watching his SubText video, where he shows how it works. Although it meets all the aims in the manifesto (text-is-a-dead-end, don't-copy-and-paste, avoid-control-flow), the feel of video is very mathematical. It feels like LISP programming with recursion. Recursion is a strange concept to most people; like postfix notation. In this way SubText (as it exists in the video) is a failure for end-user programming. Ordinary users will prefer 10 PRINT "HELLO" style of programming.
Perhaps he can take SubText into more of a wrapper thing that works at the component level. Who was that guy in the 1980s -- Brad Cox, who talked about software ICs. Connect up the output of this algortithm to that report view...

Speaking of "Language Matters" and "It's usability, stupid", have a look at the Unreal Tournament Engine. It's become that gaming engine of choice for lots of top games. A very clean design with a virtual machine and a simple language, but extended with a few key gaming features, like states.

Unreal Engine architecture

Wednesday, January 10, 2007

Example-centric programming

This Jonathan Edwards guy is brillant. His demo of Subtext is the most creative thing in programming in years. But he's basically trying to replace 50 years of programming culture: programs as text. Doomed to fail, like Charles Simonyi's Intentional programming, but may throw up interesting mashups.

End-user progamming is a dubious thing. It comes and goes as the Next Big Thing. The problem is that most people don't have the patience to write & debug software, no matter how wonderful the syntax is. Even something simple (to programmers), like control flow is a powerful and therefore dangerous notion. If you remove you drastically limit the expressiveness of the language; if you leave it in, people will stumble into the pitfalls.

End-user content, though is a killer app. My 10 year old son creates animations, web sites, and videos. A big part of video games now is the design-your-own-superhero. Choose body type, hair colour, clothes, etc. They often spend longer on it than playing the game. This isn't programming in a traditional sense. It's a type of visual programming where users assemble pre-existing components that know how to fit together. The components do the heavy lifting, while the user still feels they are "telling the computer what to do". And isn't that what programming is all about. Drag-and-drop telephony toolkits for IVRs are much the same :)