Monday, December 03, 2007

Learning Java

Most developers live in one camp or the other. Few people who work inside the Microsoft ecosystem (Win32, .Net, ASP.Net) spend much time in Java, and vice versa. That's what makes religious wars about languages IDEs so lame; most programmers have never seriously worked with both.

That's why it's refreshing to see Erik Sink trying out Java. I spent last winter going from C# to Eclipse and can echo many of his points:

  • the Java string comparison (can't use ==) is a huge gotcha for C# programmers. It fails silently (no compiler warning or runtime exception). Which is fine considering that's the Java rule for strings (use equals()). CheckStyle or some other lint-like tool can catch this.
  • The key bindings take getting used to. F11, F6, F5, F8 in Eclipse, instead of F5, F10, F11, and F5 for Visual Studio for Debug, StepOver, StepInto, and Continue. I could re-map these, but I think it's better to simply get used to them (which gets harder as one gets older!).
  • The Java ecosystem is much more active and innovative. Spring, Hibernate, and other fascinating efforts occur here. Over in the .Net world, people generally sit around and wait for Microsoft to do something. Or they port good ideas from the Java world.

Thursday, August 09, 2007

Chief Programmer Teams

I found an ancient programming book: Top-Down Structured Programming Techniques by Clement L. McGowan and John R. Kelly (1975). It describe Harlan Mill's project at IBM where he pioneered CPT. Guess what. It worked. These reasons for success are easier to see from an agile perspective. Early 70s software development was primitive. The existing paradigm structured programming seemed solely concerned with control flow. Avoiding spaghetti logic is a good thing. But what about data? No talk about global variables, and how to avoid them. Also no talk of data structures, design patterns, module dependencies, or frameworks.

CPT's good features
  • Code centric. Unlike later methodologies that became focussed on design artifacts such as diagrams and object models, CPT was focused on code. Good clean readable code accessible to all team members.
  • "Automated" development environment. In the sense that programmers were supposed to focus on programming and a librarian (human being) managed builds, test runs, source code control, and backup. This also promoted early integration of all code (avoid Big Bang System Integration nightmares). And buildable-code every day.
  • Top-down programming. write top level code and wrote stubs for lower level. So module M would be written with all lower-level modules that it calls stubbed out. This led to 'buildable code every day' and some sort of testing.
  • "structured programming" approach promotes abstract thinks. The topmost levels consist of fn calls to lower levels. In order to understand the top level you really need to understand what each subsystem call does -- its contract.
  • Chief Programmer does the design. Other methodologies split design work out across sub-system teams, leading to inconsistencies in naming, approaches, and quality.
  • promotes metrics ('development accounting'). #bugs, #builds, LOC

CPT has some bad things

CP is supposed to be many things:
  • senior-level programmer
  • experienced professionals
  • highly skilled in analysis, spec, design, coding, testing, and integration
  • capable of managing a team (cost, time, resource requirements)
  • capable of working with senior management
  • capable of working with the customer
There is no talk of prototyping. Top-down programming really requires that you know where you're going. You're supposed to decompose and decompose. What if you reach a lower-level and realize that a module is impossible or impractical to implement?

Top-down assumes monolithic software where the programmer has control of main() and the entire application stack. And indeed in the old days that's exactly how it was. Modern systems are event-driven, loosely coupled, and cross multiple technology boundaries. A user browses to a web site, whose server-side code invokes a web service, that updates a DB, that has a db trigger, and queues a job request in MSMQ, that generates some content, that notifies 'subscribers' using RSS. The programmer relies on the underlying platform to connect these pieces up using mechanisms and interfaces of the platform's (not by the programmer's) choosing.


All in all, CPT reminds me of open-source programming Linus Torvalds-style, pre-Internet where everybody is in one room.

Monday, June 18, 2007

On Code-Generation Tools

The 90th Percentile had an article bashing code generation as a programming technique. He is suspicous that visual programming is no better (and in many ways worse) than textual programming. The generated code is often unreadable, and in the case of proprietary tools, you'll be forever dependent on the tool vendor for bug fixes and updates.

The IVR industry is especially wed to visual tools, because they actually work well. For DTMF apps, that is, whose structure is basically a tree. However, the tools promote the anti-pattern of putting all the business logic in the IVR equivalent of the onClick event.

I would defend code-generation for cross-language problems such as build and deploy tools, or backup tools that are a combination of code and scripts.

Tuesday, June 12, 2007

Mondrian at Google

Python's inventor Guido van Rossum is at Google. His first project was a tool for code reviews called Mondrian. Described here and in a video.

This is a revealing glimpse of a 21st-century software development organization.
  • Heavy use of tools to automate the organization's own development process.
  • Social not silos. Developers can view other developer's Mondrian dashboard. The tool is an enabler, not a rule enforcer. The review process itself builds relationships between senior and junior staff.
  • Save everything. Mondrian saves every reviewed source file and all comments. Great for resolving customer problems months later. Also great for tracking metrics.


Other notes
  • data encrypted on HD so when throw out server no privacies worries
  • Google uses perforce (p4) but no developer branches! Means code reviews must work with files on dev machines. They use NFS so anyone can browse anyone else's machine
  • runs on one box! in python
  • uses Google BigTable

Wednesday, June 06, 2007

Replacing the OS

Marc Andreesen once said that "the combination of Java and a Netscape browser would relegate the operating system to its original role as an unimportant collection of slightly buggy device drivers." Pretty funny, considering Microsoft has fifty billion in cash and Java is nowhere to be seen on the desktop.

Yet the idea remains tantalizing. Change the phrase to "JavaScript and a web browser" and we have AJAX. Or Adobe Flex. Or Google Gears.

In fact one could make a thin client platform out of one of these AJAX technologies and then replace the OS with a minimal set of services, like the GEOS operating system. Feasible but highly improbable, at least on the desktop. Yet if some new device appeared, larger than a cell phone but smaller than a laptop, it's a whole new ball game. Instant-on is a feature I would dearly love to have, and it's not going to happen on Windows.

Tuesday, June 05, 2007

Speech to Text coming to cell phones

This video introduces Morpheus's upcoming speech to text technology. The reason this is important is audio bandwidth. The phone network is based on 64 kbps audio of 4 KHz bandwidth, which is fine for humans to understand but is missing a lot of the higher frequencies that speech recognition engines need to improve their accuracy. That's why phone-based speech rec uses discrete grammars, which are simpler to recognize. Desktop speech rec can do full dictation because a high-quality audio path exists.

Morpheus (and others) use network-based speech rec. The user's device captures the audio and does some basic processing before streaming it as data. Network-based speech rec engines receive the data, do the recognition, and send back the recognized text. Not only does this avoid the audio bandwidth problems, it also avoids running the speech rec engine on a CPU-limited cell phone.

This still isn't perfect dictation accuracy. The Morpheus video mentions about a 10% error rate. So it's still not really ready for dictating blog posts from your phone, but the accuracy is tightly tied to CPU power which improves every year.

As the VUI design blog says, the recent acquisitions of BeVocal and TellMe is perhaps being driven by interest in network-based speech.

Tuesday, May 22, 2007

Pat Helland is back

Pat Helland, the guy who really understands the concept of time in computing, is back. I saw him talk at a Microsoft event in Ottawa. Distributed transactions are bad. He believes that a reserve & cancel approach to business transactions was preferable.

He's also the guy who calls XML the "cardboard" of computing, as in you wrap up something in cardboard, stick a label on it, and send it somewhere.

Thursday, May 10, 2007

Friday, April 20, 2007

YouTube video



I was in Southern Ontario last week, and saw some amazing wind energy windmills near Lake Erie near Long Point. These things are massive and dominate the skyline in a war-of-the-worlds sort of way. Thanks to Ian Bigham for the tour -- I took a short movie:

video


PS. Farmers are limited to one windmill per 50 acres so although $5000 per year is nice, it's hard to host lots of windmills.

Friday, April 13, 2007

Is Microsoft Dead, or just Sleeping?

Paul Graham's article is intriguing. Of course Microsoft isn't dead; as Joel Spolsky likes to remind us, they have enough money (40 billion or so) in the bank to continue for years, decades even. They can buy any talent or startup they want. They can throw a hundred programmers at a problem faster than you can say absorb-and-extend. But are they dead as a threat? He makes a good case.

Software can be split into: the OS, business apps, games, email, entertainment, and e-commerce. The last four have moved off the PC. Games are on consoles or on-line (World of Warcraft). EMail and entertainment are on google, youtube, and friends. E-commerce (travel, ebay, banking) is browser-based. The only thing left are business apps and the OS.

Microsoft Office rules biz apps, and given the huge migration costs (data migration and training), that isn't about to go away. CAD/CAM and other specialty software will stay on the PC for now.

As for the OS, it really depends on the others. If you can do games, email, etc on some other OS, why buy Windows?

The fly in Paul Graham's argument is price. Apple used to love ride the demand curve. They'd bring out Macs at $4000 and let the die-hard fans buy. Then they'd lower the price and well-off firms would pony up. Finally the price would drop to its competitive mass-market price. What's to stop Microsoft doing the same, and dropping the price of Windows to $10 and Office to $40? At that price, Linux's warts start to look more troubling.

Another issue in Microsoft's favour is the PC itself. With prices down below $500, the main rationale for thin clients goes away. If you have a 3G CPU with 2G of RAM, it makes little sense to use it only as a glorified terminal. By its existence, uses will be found for it. Therefore client-side sofware won't disappear. Therefore Microsoft's "home turf" isn't going away. Doesn't mean they'll stay on top, but as the Hotmail founder said when bought out for $460 million bucks -- "when you have millions of customers, it's too big a lead for anyone else to catch up".

Privacy is the killer of on-line apps. A few bad news stories about data theft or government back doors into Google code, and the whole web app things gets riskier.

Update: Fouad's new blog Developer's Vista covers this topic. Also, this article suggests XBox is a 5 Billion dollar money loser and a "disastrous endeavour". Good thing they can afford it.

Thursday, April 12, 2007

DRM and the Viacom - Google Suit

Scott Rosenberg blogged on the Viacom - Google lawsuit. He pulls apart some fallacies in a NYT article, and calls Viacom's suit more of the same strategy -- sue the customer.

A commenter made the standard comment that people don't mind paying for legal stuff if it's convenient.

>Cheap is always good, but most people actually prefer legal and reasonable over illegal and free.


I just had to reply (rant) that this goes against basic economics. People prefer to pay less. Always. The success of YouTube & Napster are proof of this. And the only thing forcing people toward legal downloading are the lawsuits that Scott's blog article criticizes. Even with them, the chances of being prosecuted are vanishingly small. Yes iTunes makes money but this is mainly the one-time purchase of classic rock by the 5% of people willing to pay. The leakage from DRM to DRM-free content is a one-way trip. Once Sheryl Crow's hits are on a hundred million computers, no record company iTunes-clone is get to them them off.

Content is expensive to create and (digital) content is free to copy. How can these two facts ever be resolved? I don't see any good resolutions short of a police state or artists relying on a pass-the-hat form of existence. When every rock concert can be streamed live from cell phones to the internet in HD quality, what's left for the artists...

Wednesday, April 11, 2007

Adding Simplicity - A Great Architecture Blog

Artima referred (via Martin Fowler) to Dan Pritchett's excellent blog Adding Simplicity. Dan is an eBay architect and covers the scalability and reliability issues of distributed web-based systems.

The title of the blog comes from the notion that a great design is "not when there is nothing left to add, but when there is nothing left to take away". Adding simplicity is counter-intuitive; the idea that one has to work at "adding" simplicity. Ira Glass has a great series of videos. Nominally on "storytelling" they are really concerned with how one becomes good at a craft. One big point is the need to throw stuff away. He says something like "anytime you hear a good song or radio play, it's the result of ruthless hacking, trimming, and removing". Things that are good are good because the bad stuff has been taken out. Duh. Yet creative people (and software architects would include themselves here) like to believe that ideas and creativity are a good thing; inspiration shouldn't be tampered with. I certainly see this in software systems where designers get carried away with the "beauty" of their creations over, say, its usefulness to the task at hand.

For software architecture, the key aspect of simplicity is its target. The clients of an architecture are testers, developers (who fix, extend, port, and scale), users, and administrators. Making something more complicated (such as requiring all transactions be idempotent) actually makes things simpler for the people running the production systems.

Tuesday, April 10, 2007

Dreaming In Code -- The Myth of the Magic Closet

Scott Rosenberg's new book Dreaming In Code is a great read for anyone involved in software design over the last decade or two. I'm only on Chapter 4, but it's shaping up well. The book is about a failed project, Chandler, to build a "cross-platform, open-source, PIM in the spirit of Lotus Agenda". The words "in the spirit of" are the big red flag, because it revealed a massive hole in the product spec. As Joel's review of Dreaming in Code says, "Whenever the spec describes the product in terms of adjectives (“it will be extremely cool”) rather than specifics (“it will have brushed-aluminum title bars and all the icons will be reflected a little bit, as if placed on a grand piano”) you know you’re in trouble"

Chandler's failure is important because it was being built by top talent; famous names from Apple, Netscape, and other illustrious software projects. That a software project failed is hardly new; but that _these_ guys failed reveals a lot about the state of software development. As Rosenberg says, bridges are built pretty much on time and on budget. Why can't software, after fifty years, be the same?

In defence of the Chandler team, they were tackling a known holy grail. I call it The Magic Closet. Imagine people with messy houses being able to toss all their clutter into a closet and the closet would magically sort and arrange it. Later you could reach in for your swim mask, your high school yearbook, or that little hex screwdriver for your glasses and it would be right there. Chandler applied this idea people's data. Computers have massive storage and fabulous ability to processs information. Surely we can build something where you dump all your contacts, e-mails, photos, pictures and assorted notes. It would magically arrange and cross-reference the data into useful information.

Magic Closets have been tried before. Most recently, Windows WinFS (the Cairo project) was an attempt to replace the file system with a database and gobs of tags and meta-tags. Failed. Lets look at some reasons why Magic Closet projects fail. Of course hindsight is 20-20 and it's easy to criticize. But a repeated failure is not just a fluke. It must reveal something important about our conception of software.

Reasons for failure:
* lack of a data model
* if you need an AI, say so
* too many smart people in a room
* leader is not a programmer

Lack of a Data Model.

Most software can be described in Model-View-Controller terms. The Model is a media-independent representation of the data that the app is about. For a word processor it's text. For a video editor it's audio and video clips. For a PIM it's people, contacts, appointments and messages. Usually software is written by writing the model first, then adding views and controllers. Not only is it important to define the model early (because changing it later is painful), but a working model allows development of views and controllers to proceed in parallel. Doing things in a different order is wierd.

Chandler wanted to be revolutionary. A simple PIM would have a Person record with a PhoneNumber field. Answering the question "What's Joe's phone number?" is easy. But a single field is restrictive. People have home phones, mobile phones, fax lines, and work numbers. People have multiple homes (summer and winter). Offices have multiple phone numbers (switchboard and private direct access). A fancier PIM would support multiple phone numbers per person, but using a fixed schema. Basically, each person would have a primary phone number and a bunch of extra phone numbers. So for many purposes the software can still act as if Joe has a single phone number. But Chandler didn't like fixed schema; they wanted an open-ended system of tuples. Not only could a person have unlimited phone numbers, but there wouldn't even be a primary phone number. There might be rules to determine (based on, say, time-of-day) what the primary number is. Or consider your doctor (Joe) who ends up being also the father of your daughter's boyfriend. Now he has two roles. You wouldn't phone his home number to renew a prescription; and you wouldn't phone his clinic to invite him to a BBQ. Chandler wanted to capture all these "use cases". Except now answering the question "What is Joe's phone number?" is hard. It depends. A skeptic might suggest that the user could simply specify "phone Joe at home". But that defeats the spirit of Lotus Agenda that Chandler wanted to have. The software should know what's going on and (magically) call Joe's cottage when that is the appropriate choice.

Perhaps a data model for this sort of thing is possible. But it's wildly complicated. And users won't use something they can't understand.

If you need an AI, say so

Chandler was supposed to parse useful information out of e-mails and other documents. This requires artificial intelligence. If that's what they really wanted, the Chandler team should have hired dozens of PHDs for a ten-year project to (first) research and then develop such an AI. Otherwise the software is just guessing. And as anyone who's used speech recognition software can tell you, being right 90% of the time is not good enough. Having to review and correct your input is just too painful.

Too Many Smart People In a Room

TMSPIAR. An excess of talent is a big problem. Too many cooks... as they say. I've been in meetings like that and they are highly problematic. Talented people want big scope and big challenges. People at these meetings are either cowed by the high-wattage talent, or get into pissing matches. When a project is open-ended (a code word for vague), then just about any design choice can be justified as a requirement.

Surely all software projects can have a single architect. There might be a few exceptions to this, but a PIM is hardly an F-16. A single mind can always produce a better and more consistent design than a committee.

Leader is Not a Programmer

I don't know if this is true. Mitch Kapor wrote Lotus 1-2-3. But he has a PHD in Psychology. I'm guessing he is non-technical in a way that only a technical person can understand. Non-technical leaders can challenge people in good ways, but it's a risk factor for software projects.

That's it for now; I'll review the rest of the book later.

Tuesday, March 06, 2007

Happy Trails -- The Joy Of Logging

Logging is one of the most important infrastructure activates that a piece of software does. Good logging saves hours of support and troubleshooting time. For instance, every error that is log should indicate a single place in the source code where the error occured. You can do this with unique error code numbers. I often just vary the error text slightly, so errorFileNotFound says "can't find file %s" in one place and "can't find file: %s" in another. The ":" is all I need to know where to look.

I used to develop GUIs and logging wasn't necessary. In those old days of manual testing, you simply ran the app and could see immediately if the labels didn't line up, or the selection rectangle was wonky. Then I moved into telephony, where servers run in very programmer-unfriendly environments. Think ten minutes to reboot a PBX; after you found the PBX technician to do it. If that wasn't enough, random errors were common, such as "some callers never heard the final prompt".

The only solution is good logs. Good being defined as:
  • lots of detail. each log message should include date & time (down to msec), thread, and logging level. Log files should of course be ordinary text files.
  • configurable logging levels so you can increase/decrease/turn off logging without having to re-build, or even re-start the software.
  • rolling log files that have a quota; so the hard disk won't fill up
  • Errors marked in a very visible way (that is, easy to search for).
Log messages don't have to be pretty. Only programmers look at them.

The inventor of Forth, Charles H. Moore, declared that exactly two stacks were what was needed. I always found that remark suspect, having being taught that software should do something either zero, one, or an infinite number of times. However I now believe that two log files are needed. One is the trace log described above: heavy in detail and ugly to read. The second is an error log that contains only errors and startup, shutdown messages. The error log is several orders of magnitude smaller so you can see several days of activity at a glance. All error log messages also appear in the trace log so you can cross-correlate things.

Someone once remarked that they loved Star Trek because every episode some key piece of equipment broke but they always managed to fix it. Scotty would say "run the Level 3 diagnostics". We didn't know what Level 3 diagnostics were, but we know we want them for our software. Good software should have some idea of how sick or healthy it is. A basic health monitor would track: the number of errors, the number of sessions, the number of users, and other high-level stuff. This should be written to the error log once per hour. It will help quickly pinpoint when a problem occurred.

In fact counters are one type of trails, which are things that track software behaviour. My favourite is a Trail object that you give strings to. It builds up a trail of these strings, separated by ';', such as "RING;ANS;PLAY;DELAY;PLAY;DISC". A very short summary of what the software did on a phone call. Trails are good for logging, but even better for unit tests. Adding counters and trails into your classes are a great way to track what they did. You can add wrappers or fancy mock objects. But simple trails often do the job. For example, consider a class that had to send & receive complicate TCP/IP messages. There's a lot to unit test here, including partial or garbled messages, and broken connections. Not to mention race conditions and deadlock! But a simple m_msgCount++ in the send method is very helpful in unit tests because you can very quickly check how many messages were sent. Yes, the counter simply tracks a method call; it doesn't really track whether TCP/IP activity occured. But you have unit tested the lower-level i/o classes haven't you?

At a philosophical level, trails are useful because software exists in time. You can't see software in the way you can see a bridge or a house. All you can see is tiny portions of source code, or pixels on a screen, or other output. We've all experienced coming across a bug in the source code and wondering "how the hell did this ever work?". There are many paths through the code and the broken path may be skipped; except of course when it matters, like you're doing a demo. Basically software is invisible and trails help you to see it.

SpeakRight 0.0.2 is out

Here It's definitely starting to take shape. Simple directed dialog apps can now be done.

Next up: mixed initiative and more SROs.

Monday, February 26, 2007

Announcing the SpeakRight Framework

There's been a big change in direction for me in the last month. I'm going open-source. Spent the last month learning Java and Eclipse...and creating an open-source VoiceXML framework called the SpeakRight Framework. It's a code-based approach to writing speech recognition apps that, I believe, results in much higher code re-use and therefore faster development time. Also, modern Java IDEs are amazingly smart and helpful. A good Intellisense (called Code Assist in Eclipse) is as good as a GUI-based/property page approach.

Currently SpeakRight has only been tested on the Voxeo community site. It's VoiceXML 2.0.

A key part of the plan are "SROs". SpeakRight Objects are re-usable speech objects that, being open-source, should help encourage a growing body of components to build speech apps out of.

Can't wait to try SpeakRight out on Linux, and on MSS 2007!

Tuesday, February 13, 2007

Grokking StringTemplate

The StringTemplate template engine is a powerful tool for generating markup text or code. It does, though, take a little getting used to. The templates may like a type of programming language, but there are some key differences. And since it's not compiled, bad syntax may just fail silently. Luckily it's quick to try things.

Update: I have corrected errors in the original post. Namely $if(it.CountIsOne)$ is the syntax for referencing $it$ inside an $if$.

First of all let's start with something that works. We want a field tag to output a prompt tag for each prompt in the list promptL Assume promptL contains an ArrayList of two strings "a" and "b". This is the XML we want (ignoring whitespace issues)

<field>
<prompt>some a</prompt>
<prompt>some b</prompt>
</field>


Here are the templates that successfully did this.

//generate the field tag
field(promptL) ::= << <field> $prompts(promptL)$ </field> >>

//generate a tag for each item in the list. There are other prompt tags so we parameterize tagname
prompts(promptL) ::= << $promptL:prompt(tagname="prompt"); separator="\n"$ >>

//output a single tag
prompt(tagname) ::= << <$tagname$>some $it$</tagname>
>>


Now let's add one thing to the output and see what template changes are required. Let's say we want each prompt tag to have a count attribute that is "1" for the first tag, "2" for the second and so on. Like this:javascript:void(0)
Publish

<field>
<prompt count="1">some a</prompt>
<prompt count="2">some b</prompt>
</field>


This is easy. We can just use the built-in $i$ to do this:

prompt(tagname) ::= << <$tagname$ count="$i$">some $it$</tagname>
>>

Now let's say we want to output the count attribute only when the count is greater than "1". This is where things got tricky.

This DOESN'T WORK. ST doesn't let you compare values in $if$. This would go against the spirit of ST whose goal is a strict seperation between presentation and model logic.

prompt(tagname) ::= << $if($it.count$=="1")$ <$tagname$ count="$i$">some $it$</tagname>
$else$
<$tagname$ count="$i$">some $it$</tagname>
$endif$
>>
Maybe we can replace the strings in promptL with Java objects that have a getCountIsOne method. We'll also need a getText method to get the string as well. Recall that $it.someFunc$ resolves in Java to a search for a method named getSomeFunc().

This WORKS!. Just remember the syntax is is to leave the '$' characters off it when inside a $if$. You're inside $$ already.

prompt(tagname) ::= << $if(it.countIsOne)$ <$tagname$>some $it.text$</tagname>
$else$
<$tagname$ count="$i$">some $it.text$</tagname>
$endif$
>>

Another approach is template application. We change our Java method to getCountNotOne and have it return null when it's the first item in the list, and the count otherwise.

This WORKS! We apply the template docountattr on $it$

prompt(tagname) ::= << <$tagname$ $it.countNotOne:docountattr()$>$it.text()$</tagname>
>>
//a way of outputting count attr only when count not 1
docountattr() ::= << count="$it$">>

Template application is powerful. You can pass parameters in as well.

THIS WORKS!. We're applying the template prompt with two parameters tagname and bargeIn.

//generate a tag for each item in the list. There are other prompt tags so we parameterize tagname
prompts(promptL) ::= << $promptL:prompt(tagname="prompt",bargeIn="true"); separator="\n"$ >>

prompt(tagname, index) ::= << $if(it.CountIsOne)$ <$tagname$ count="$i$" bargeIn="$bargeIn$">some $it$</tagname>
$else$
<$tagname$ count="$i$"
bargeIn="$bargeIn$">some $it$</tagname>
$endif$
>>


Once you've grokked things, StringTemplate gives you excellent separation of presentation from logic, and is very good at extracting what it needs from ordinary Java objects.

Friday, February 09, 2007

Learning Java

After years in the Microsoft I am finally learning Java and Eclipse. For anyone familiar with C#, Java is extremely close but feels its age. No properties, delegates, or events? I can live without them. Eclipse however is another story. What AN AMAZING PIECE OF SOFTWARE! Miles ahead of Visual Studio in many areas. Mainly it just works; which is a huge kudo to any piece of complicated software.

And Eclipse is free. In fact, one person with a laptop, GMail, Eclipse, and SourceForge is the equivalent of a five person company from the 1980s.

Open-source software is unstoppable in many ways. You just can't compete with free. Software tool vendors have to keep moving into more and more niche areas. All the big ones, like source controls, IDEs, and bug tracking, are well covered by OSS. That being said, Microsoft has something like 50 billion dollars in the bank. They could not earn a cent for the next ten years and keep paying every employee. That's kind of unstoppable too!

Saturday, February 03, 2007

Daily backup of a laptop

Thanks to Gerald Gibson Jr.'s great article I now have a free way to do daily backups of my laptop. Windows natively supports the zip format (as "compressed folders"). The article is a C# program that generates ZIP files using the Shell API.

(Be sure to get the latest version from Gerald's web site)

My app zips several important directories into zip files in c:\zip. Then, using a free account at box.net I upload the zip files daily. The free account has a limit of 1 GB and each file must be less than 10 MB. But for 5$ a month you get much larger maximums.

Not the finest solution; but it'll do for now. KISS.

Update: A Mozilla plug-in called GSpace turns your GMail account into a FTP site. Each file you upload becomes an e-mail. GMail now has 2.5 GB of storage.

Thursday, January 18, 2007

MySpace is Show Biz

Great article by David F. Carr on the behind the scenes mayhem at MySpace, whose exponential growth from a few hundred thousand users to 30 million users overloaded the system.

"MySpace started small, with two Web servers talking to a single database server. Originally, they were 2-processor Dell servers loaded with 4 gigabytes of memory"

Again and again MySpace was re-written to add: replicated databases, SANs, ASP.Net, and 64-bit servers. One interesting lesson was the addition of caching

To further lighten the burden on its storage systems when it reached 17 million accounts, in the spring of 2005 MySpace added a caching tier—a layer of servers placed between the Web servers and the database servers whose sole job was to capture copies of frequently accessed data objects in memory and serve them to the Web application without the need for a database lookup.

This sounds a lot like what GigaSpaces is offering. It's a LindaSpace-like type of middleware that replaces a messaging paradigm with a associative shared memory approach. Intense caching and heuristics to move data closer to its clients.

Scott Rosenberg makes the point that Web 2.0 successes may look easy from the outside, but behind the scenes there are huge challenges. Like Show Biz!

Friday, January 12, 2007

Why to use code generation techniques

Mark Baker comments on the problems with (xml) document validation. Any DTD that is too detailed is a time-limited contract. ProductType may be "1", "2", or "3" today. But a year from now types "4" and "5" may be allowed.

True, but validating, or what used to be called "laundering your input", still has to be done. Whether it's in the gatekeeper or the business object, the code has to be there.
But that’s really no way to write long-lived software, is it?
Yes, it's fine actually. Who cares where the error message "producttype '4' not allowed" comes from. All that matters is that at any instant in time it either processes valid messages or returns a suitable error message.

If splitting validation code from the business logic represents a maintainability problem, then consider code generation. Frameworks like StringTemplate make code generation easy. CodeGen tackles the common problem that a feature requires changes up and down the software stack. The traditional way to do this was manually make all the changes and rely on comments and checklists to ensure all the required changes are done. With CodeGen you can use a more literate programming style. Define in one place, generate various bits of code, and inject it into the appropriate places in your code. Patterns such as Chain Of Responsibility help.

The Beauty of Unreal

Further to my recent post on example-centric programming, Tony responded with
I entirely agree with the audience targeting critique you make. I was thinking about the historical roots of programming, the social aspects, and how they must be considered, not just for fuzzy artsy reasons but for because they are part of the sociology of skills and attitudes that mold the field (we are the medium of programming).
I think programming has strong roots in telephony and in code-breaking and
in mathematics. These roots influence the need for interfacing at the hardware level,
efficiency and high abstraction.

The PC revolution is in a way an extension of the business view of data processing, which was very much separate from the scientific programming done on mini-computers. In fact now that I think of it, this may be why DEC missed the boat on PC's - PC's are for business and now are appliances, not really computers - but data processors (ironic given the PDP name - there's a story there...).

So what this guy is advocating is a reconciliation of these two worlds, a wrapper view. Not sure that can fly in social terms, for the reasons you identify, and because we have two worlds in software, the programming in the old telephony-coding=math sense and the data processing view with forms, HTML, windows and high-level object libraries.

Alan Kay's deep thoughts on similar topics are here:

I hadn't really thought about the two ends of programming like that before. I guess there are two cultures. Part of my confusion is from watching his SubText video, where he shows how it works. Although it meets all the aims in the manifesto (text-is-a-dead-end, don't-copy-and-paste, avoid-control-flow), the feel of video is very mathematical. It feels like LISP programming with recursion. Recursion is a strange concept to most people; like postfix notation. In this way SubText (as it exists in the video) is a failure for end-user programming. Ordinary users will prefer 10 PRINT "HELLO" style of programming.
Perhaps he can take SubText into more of a wrapper thing that works at the component level. Who was that guy in the 1980s -- Brad Cox, who talked about software ICs. Connect up the output of this algortithm to that report view...

Speaking of "Language Matters" and "It's usability, stupid", have a look at the Unreal Tournament Engine. It's become that gaming engine of choice for lots of top games. A very clean design with a virtual machine and a simple language, but extended with a few key gaming features, like states.

Unreal Engine architecture

Wednesday, January 10, 2007

Example-centric programming

This Jonathan Edwards guy is brillant. His demo of Subtext is the most creative thing in programming in years. But he's basically trying to replace 50 years of programming culture: programs as text. Doomed to fail, like Charles Simonyi's Intentional programming, but may throw up interesting mashups.

End-user progamming is a dubious thing. It comes and goes as the Next Big Thing. The problem is that most people don't have the patience to write & debug software, no matter how wonderful the syntax is. Even something simple (to programmers), like control flow is a powerful and therefore dangerous notion. If you remove you drastically limit the expressiveness of the language; if you leave it in, people will stumble into the pitfalls.

End-user content, though is a killer app. My 10 year old son creates animations, web sites, and videos. A big part of video games now is the design-your-own-superhero. Choose body type, hair colour, clothes, etc. They often spend longer on it than playing the game. This isn't programming in a traditional sense. It's a type of visual programming where users assemble pre-existing components that know how to fit together. The components do the heavy lifting, while the user still feels they are "telling the computer what to do". And isn't that what programming is all about. Drag-and-drop telephony toolkits for IVRs are much the same :)