Tuesday, October 31, 2006

When more is less

In retooling our QA lab we've dropped the number of machines by half...and become more productive! Our work involves constantly re-installing windows service packes, hardware drivers, and assorted telephony software. Turns out that looking after of lots of machines was wasting time. We didn't used to have an organized way of managing the machines; we didn't know what was already on them, especially after a week of testing.

The cry always went out for more machines, which arrived "clean" and ready to go. Of course, after a testing cycle the new machine could end up just as messed up as all the others. Later, when the next release rolled around it was "we need more machines" all over again.

Reminds me of the old tale about the workbench with 25 screwdrivers and a set of storage slots for the screwdrivers, but only 24 slots. In the beginning each screwdriver was in its appropriately marked slot. Except #25 which someone borrows. Then another worker borrows #15. When the first worker returns, he puts #25 into the only open slot, #15's slot. Then he borrows #7. Now the second worker returns and puts #15 into slot #7. Pretty soon all the screwdrivers are in the wrong slots. The whole storage system fell apart because of a single insufficient slot.

Monday, October 30, 2006

Even in Windows-land, OSS is here

We installed Subversion and Trac here last week to replace Starteam for defect tracking and source control. Firefox and GMail/Yahoo/Hotmail are rampant. Asterisk answers the phone. All these open-source software tools have replaced commercial products. Even here in a traditionally Microsoft shop, OSS is starting to mainstream. And we didn't need to switch to Linux to do it. Yes, the Asterisk box runs Linux, but it's just a box on the network, like GMail is a service on the larger network called the Internet. Who cares what GMail servers run; Windows clients can use it happily.

Yes we still use Exchange, Visual Studio, and GotoMeeting.com. But in five years?

Tuesday, October 17, 2006

Ballmer on Vista

Scott Rosenberg has an interesting article on Ballmer's comments that Vista has come out pretty much as expected. I think there are about 30,000 programmers at the Redmond campus. It boggles the mind how that many coders can get anything done that doesn't duplicate, simulate, or break what others are doing. For many years they coped through specialization. Each programmer got something tiny, like a single Win32 API call, to work on.

Then after XP they pinned the needle the other way, and a rewrite orgy began. It produced some good things. XAML is good. They completely rewrote Window's TCP/IP stack; which is quite an act of chutzpah, and now it supports multi-core CPUs in useful ways. But overall they ended up suffering exactly the sort of problems that a Big Bang approach would bring.

In a way it's a smart move though. Most people are quite happy with XP. Any next version had to be a long-term, next-generation thing. People will love Vista in 2009.

Wednesday, October 04, 2006

Estimating Software Projects

Estimating how long it will take to finish a software project is one of the toughest parts of software engineering. Luckily it's easier to get better: start making estimates now and track your accuracy.

Two rules of thumb:
  1. Make a guess and double it. I find this one surprisingly accurate. Probably because I tend to work on the same sorts of projects; but then don't most programmers!
  2. Brook's book The Mythical Man Month says that every app has a core set of functionality that can be done in X weeks. To create a product will take 9X because of all the wrapper code for config/reporting/error handling, and all the user documentation that needs to be provided. People constantly underestimate this factor of 9.
We normally provide estimates that include design, coding, user docs, but not QA. My manager likes an estimate to broken down into a list of tasks that fit on a single page. Tasks that you are most unsure of should be clearly identified by using a range such as "4-10 weeks".

The whole game of making & meeting estimates is about what features are "in" and what's "out". I like to tie estimates to an explicit list of what features users will and won't get. Requirements are bound to change, and managers sometimes have a habit of forgetting that your estimate was tied to a feature set.

As usual, Joel has written on this already.

Wednesday, August 23, 2006

You can't tune silence

Tuning refers to the post-deployment tweaking of grammars in a speech app. Your QA department can test the features, but there's nothing like real callers in all their variations of speaking styles to really probe your app's grammars. If the open-source mantra is "with enough eyeballs all bugs are shallow", then the speech app mantra should be with enough speakers all grammar bugs are shallow.

Tuning is time-consuming because a human has to listen to the audio recordings of hundreds of phone calls. If the system can log speech rec errors, then the person doing the tuning can zero in on those utterances. If not, and we will see how this can happen, tuning becomes very difficult.

Out-of-grammar errors are the easiest to detect because the speech recognizer returns a NoRec error. These may be due to coughs, background noise or other audio problems that grammar changes can't fix. Others are due to variations in pronounciation that your grammar doesn't know about. People's names are notoriously hard this way because: (a) names are multi-lingual in origin, and (b) many people will mis-pronounce a name they've only read. We had a Mr Biber here, pronounced bee-ber, but many people would say bye-ber. Tuning revealed this, as well as the fact that some people would say only the last name. The grammar needed to be changed to cover all these possibilites.

The more difficult problem is what we call the Rumplestiltskin error. Most speech engines want to recognize. Try saying "Rumplestiltskin" to a speech-rec auto-attendant. You'll be suprised at how often it will find a (wrong) match. Saying something completely outside the grammar will cause a medium-confidence recognition of some wrong word in the grammar. Of course you can affect this by raising the recognition and confidence thresholds, but that may cause other unwanted recognition problems. Confirmation, for example, is an added step in the dialog that callers can grow to resent. The problem with a Rumplestiltskin error is that it's invisible; the system really has no knowledge that a recognition error has ocurred. This makes finding the error a time-consuming task for the tuner. The caller may be saying an acceptable alternative for a word, but until we discover this and add it to the grammar, the system will not be pleasing this caller.

Tuning becomes a big problem when dyamic grammars are used, because you can't do any pre-deployment tuning. The app will read a list of phrases from a database, say Little League team names, and generate a grammar. Consider the "Nowell Gnats". It's not unusual for the TTS engine to pronounce a word differently from what the speech rec engine accepts as a pronounciation. This is bad because your prompt may say it as "you can say Nole Nats..." whereas the speech rec engine wants "Now-well Nats".

It's easy to end up with a situation where the app won't accept any sensible pronounciation for a word, and even tells callers the wrong pronounciation to use! All without generating any recognition errors. Silent failure.

Can the speech rec platform help? Yes, here are a couple of ideas. First, the app should have "tuning mode" that can be set temporarily. It increases reco thresholds to force more confirmations, and logs all rejected confirmations as candidates for tuning. Secondly, the system should have a batch process that studies patterns of calls. If the same person calls back in several times in a row and never seems to complete a transaction, then these calls are also tuning candidates.

Tuesday, August 08, 2006

Bye Bye Speech Server -- Hello SPS

So, Microsoft Speech Server is being folded into Office Communication Server, and will be known as Speech Platform Services. This was announced at today's SpeechTek conference. Office Communication Server (OCS) is basically Live Communication Server, which does presence and SIP call routing.

I have fairly mixed views about this move. First, the negatives. Yet again, Microsoft is making a mid-course correction with their speech offering. At the beginning the vision was multimodal speech. Then it was telephony-based speech rec by web developers (by sprinkling "SALT" on their web pages to speech-enable them). Then it telephony-based speech rec using more standard methods, such as VoiceXML. Now it's speech rec as part of an enterprise information system. For developers actually trying to build applications, all these course corrections are unnerving, to say the least. If the overall goal, as Microsoft says, is to create an "ecosystem" then they need to realize that stability is a key factor. Wandering climate change isn't helping.

Now the positives. OCS is a strategic product for Microsoft. Communication is becoming a key part of every organization. Communication across devices. Synchronous and ayschronous communication. Features such as: presence, IM, VoIP, video, Find-Me, Follow-Me, and ad-hoc conferencing. Think EMail++. For speech rec to be bundled into such a strategic product is a huge win. It also shows that the speech folks at Microsoft have been listening and learning. Changing course often works out better than not changing course!

Lets hope the transition to OCS is as seamless as possible for existing MSS speech developers.

Nice Clutch Play by Microsoft

Anyone who does speech recognition demos knows how they can go bad. This one by Microsoft's Rob Chambers (YouTube) is about as bad as it could get. After mis-recognizing the presenter's first utterance, the recognizer got even worse at understanding his correction attempts like "select all and delete".

Turns out it was a Longhorn bug with its audio software. Today Rob did the demo again, at SpeechTek no less! Eight minutes long and it worked fabulously. It was a nice comeback and a gutsy move -- way to go guys!

Friday, June 16, 2006

Charlie Wilson's War -- Book Review

The sub-title says it all "The extra-ordinary story of the largest covert operation in history". George Crile has written a fascinating account of the CIA's role in the Afghan resistance to the Soviet Invasion of Afghanistan.

Charlie Wilson was a Dem senator from Texas. Most of the book deals with his efforts in Washington to get funding and support to his beloved mujahadeen. Washington is revealed not as a massive beurocratic system, or machine; but as a personality driven clique. One person, with the right skills (and Mr Wilson seems to have been a master), can find and grab the levers of power over literally billions of dollars in funding.

The other feature is the Alice-in-Wonderland nature of the world of the 1980s vs today's post-911 world. Then it was the Soviets who invaded a muslim country and faced a fierce insurgency. The US supported the insurgency at the urging of a Democratic president.
They supported (and were cheered) by fundamentalist jihadis, giving them secret weapons for attacking convoys and shelling bases.

One interesting question regarding today's Iraq is the lack of a Stinger missile. It made a huge difference in the Afghan war; as it would in Iraq. Strange that twenty years on there aren't tons of Stingers and its imitations available on the black market.

A movie is being made of the book.

C# Cookbook -- Book Review

If you program in C# then buy this book. It's an indispensible reference with 300 code snippets of everything you never have time to look up. Snippets on things like: better ways to use collections, regular expressions, generics, delegates, exceptions, reflection, i/o, XML, and security.

MSDN is great if you already know the name of the class or function that you're interested in. But if you don't, then it can be very hard. Date formatting, for example, is under string.Format() instead of under DateTime.

C# Cookbook is goal-driven. Each topic, such as "Writing a TCP Server", "Increasing StringBuilder Performance" or "Creating a Priority Queue" is a full solution to a programming problem. It covers C# 2.0 so generics and anonymous methods are covered.

Put it beside your desk. Read ten snippets a day, and in a month you'll be a better programmer.

Wednesday, June 07, 2006

Eating with Kids at the Bar-B-Barn

Eating with kids at restaurants can be a challenge, but last weekend in Montreal we hit a new high (or low). The Bar-B-Barn is a classic Quebec chicken & ribs place that takes its tradition seriously. With puritanical zeal, they have kept the same menu and decor: department of highways road-stripe-yellow paint, pine panelling, and ancient posters of hockey players.

The kids couldn't believe we were planning to eat in this low-ceilinged room with creaky floors and exposed heating ducts. My older son couldn't get over the menu. He kept asking "But there's only two items on the menu?". Yes, I'ld reply: chicken or ribs. "Just two things!?" and "Is that really it?" For someone used to the multi-page menus of a typically eatery, it blew his mind.

Son #2 is precise and his criticisms were detailed: the ribs had too much sauce, the fries were too long, and "this bun sucks!". Summing up, he concluded that "at least a crappy restaurant could have better lighting". Needless to say this sort of side commentary dulls the enjoyment of the others at the table. But we all got through it, and the food is actually pretty good. A broadening experience; for when I asked days later if they had told their friends about the Bar-B-Barn, they said "it wasn't that bad."

Friday, May 26, 2006

The Joy of Ethereal

Ethereal really saved our bacon recently on an MSS + SIP project. Calls from one gateway worked, but another gateway in California wouldn't work. Turned out that the RTP packet length was 20 msec on the "good" gateway, and 10 msec on the "bad" one.

Whoever thought up the idea of human-readable message protocols should get a Nobel Prize. Protocols like SIP, HTTP and RSS are so much easier to troubleshoot because of this. The bad old days of RPC, ASN.1, and ISDN are hopefully nearly gone.

Of course, security may force encryption of the human-readable protocols : so we may end up back where we started...