006. Zero Defects

Infinite bugs

Go back to 005. Keeping Busy with Cross-Platform OOP

Thank you for reading along and the great comments! This post tells the story of my first memo, Zero Defects, and the impact it had on me and all of Apps. Microsoft was a company that wrote code but we also wrote memos, especially in Apps. Memos were often 20 or more pages long and printed for circulation via interoffice mail—we were building those tools after all! Also, this is my first performance review.

Zero Defects

When it came time to write my first performance review, it was simply “attended ADC” and, looking forward, my only goal was to “make ET++ work.”

Still, I was nervous about writing mine. So, I shot off an email to Melissa Birch (MelissaB) asking for some tips. She was on the Word team, which was in the very final stages of shipping the original version of Word for Windows 1.0, code name Opus and another long and late project. She graduated from Brown University in 1987 (the same year I graduated Cornell). Melissa was tall, polite, and formal. We shared the East Coast rhythm and sensibilities. MelissaB was an astute engineer, tuned into the challenges of making projects work. I knew she could help.

Thanks to fellow Apps Tools developer Kirk Glerum (KirkG), I’d made my way through the gauntlet of a seemingly cliquey lunchroom of regular tables, seated at a table with MelissaB, KirkG, DuaneC, Jodi Green (JodiG), and many others on the Word dev team.

Kirk was a hacker’s hacker. He spelled his name Glærum (which wasn’t spelled that way by the facilities name-changers and was quite a trick to type on US English keyboards in MS-DOS) in reference to his family’s Nordic heritage. KirkG was as Northwest as one could get—he was the first to inform me that I was no longer allowed to use an umbrella. He grew up in Oregon. He attended the University of Washington and was a die-hard Husky fan right down to his purple Converse (when meeting people, where someone went to college was often the first fact Softies revealed since so many of us were straight from college). Most interesting was how he ordered a sandwich at the cafeteria. When asked what he would like, he always said, “Surprise me.” I could never have ordered like that. I later learned his business card listed his title as “Software Alchemist”—back then you could make up your job title and mine said “Computer Scientist” since I was convinced I would eventually go back to graduate school.

A New Yorker, JodiG joined Microsoft early on. It was immediately apparent that she was a manager because at lunch she was always asking other members of the team about their bugs and progress. She was another graduate of Brown. It was common for graduates of the same school to find each other at Microsoft, even if they weren’t classmates, because alma mater recruiting was something developers did, mostly because they knew the department, classes, and professors. I would soon be making recruiting trips to Cornell.

Microsoft Memo To: Applications developers and testers From: Chris Mason Date: 6/20/89 Subject: Zero-defects code Cc: Mike Maples, Steve Ballmer, Applications Business Unit managers and department heads Zero defects On May 12th and 13th, the applications development managers held a retreat with some of their project leads, Mike Maples, and other representatives of Applications and Languages. My discussion group1 investigated techniques for writing code with no defects. This memo describes the conclusions which we reached. Zero-defects code is the Holy Grail of programming. We are not suggesting that this phantasm is attainable on the first try, but we think we know how to get there. There is a crucial need to do this. In OBU, for example, Mac and PC Word were very late, and Win Word continues to be late, in each case because we had many more bugs than we anticipated. Large numbers of bugs at the end of a project make scheduling impossible for project leads and life unbearable for programmers and testers. Zero defects has actually been achieved on software projects; it is not an impossible goal. Zero defects must be the new performance standard for development. A “defect” occurs when something that is labeled “done” does not conform to the requirements. We need to understand our methods, and strive to improve them in order to prevent defects from happening, or recover from them if they do happen. You’ll be able to measure your success by the reduced time from code complete to shipping. You can improve the quality2 of your code, and if you do, the rewards for yourself and for Microsoft will be immense. The hardest part is to decide that you want to write perfect code.
The Zero Defects memo. This is what most memos looked like. The template was based off the MS-DOS Word memo template. With Windows 3.0 and TrueType there was a Microsoft font that people could install and the letter M in the font would add the styled Microsoft logo. If the font wasn’t installed, then the memo would be an “M Memo” which you can see a lot in the exhibits for the antitrust case. This memo is online in many places including http://sriramk.com/memos. Chris Mason led Word development. Mike Maples was the SVP of Apps (we will meet him shortly), Steve Ballmer was Mike’s counterpart running Systems. You can tell this memo is pretty formal since it spells out full names and email distribution lists. This memo came out three weeks before I started work.

To help me with my review, MelissaB, over lunch, talked about the new mantra at Microsoft called Zero Defects. We would continue this discussion over a long email thread, as was all too common.

Zero Defects was a memo that was circulated by the leading development managers (the most senior engineering managers) in Apps and Languages. It was an effort to attempt to get a handle on product death marches and ever-increasing bug counts that were contributing to a broadening view of inevitability as products became more complex.

A key underlying argument put forth was that we were rewarding developers for checking in new code and declaring a feature done, even if it was not. Testers then found a lot of basic bugs. That meant they were preventing more interesting testing from taking place and that more code to fix those bugs was quickly written, delaying the new work, and testers would continue find even more bugs.

In any software project, adding or changing code had a good chance of introducing a bug approximately 10 percent of the time (a number floating around in academic circles for decades), whether it was fixing one line of code or adding whole new capabilities. The cycle of trying to complete a feature by finding bugs could never really end—this was called infinite bugs and was plaguing the development of Omega, Microsoft’s first Windows database, and to some degree Opus, Microsoft’s first Windows word processor, which began in 1984 but did not RTM until 1989.

RTM, release to manufacturing, was a phrase heard constantly. Everything was about getting to RTM. RTM was the ultimate goal. RTM was shipping. For the first decade or so of Microsoft, RTM literally meant to a factory, a Canyon Park facility about 10 miles north of Redmond where there was a shrink-wrap assembly line of boxes, manuals, and floppy disks. At the end of every product, at RTM, teams took a trip to Canyon Park and watched the first boxes roll off the line. We might have made software but we shipped it in boxes to retail stores.

The specification for Opus from BillG famously was “build the best word processor ever” and finish by October 1985, to align with the release of Windows. This was likely the first edict for Apps to align with a Systems schedule, a topic that emerged again and again. It was also as likely as two golf balls colliding mid-air.

Zero Defects was probably one of the most profound engineering documents I had ever read, and yet it was also common sense and blindingly simple. I remember one sentence well: “The hardest part is to decide that you want to write perfect code.” This was an impactful memo, in part because it was my first exposure to the collision between the idealized world of hacking and the pragmatic world of engineering products for millions. It was so practical and made so much sense, yet it was such a dramatic change from the hacker ethos that the most and fastest coding wins.

It might sound over the top to call a single memo that is literally about how to code without bugs as something “profound.” Certainly, for me it was profound because it was the first business memo I read that was also about why we are doing what we are doing, not just how. In a broader context, however, the memo was about the novel enterprise that was Microsoft at the time. Apps was building software for millions of people that were not trained computing professionals. That was new, for everyone. This memo was a realization that the company was at a crossroads and the old way—the way of hackers and hobbyists—was no longer acceptable.

This memo also marked a change in the entire Apps division, now numbering hundreds of people. With two big projects that were late and buggy, Omega and Opus, and several other challenging projects such as the recall of Macintosh Word for quality issues, Apps needed to do something different. No other company was building software at scale across so many categories and so many platforms as Microsoft Apps was doing. While all this was going on, Excel version 3.0, for both Windows and Macintosh, was under development and would soon ship merely 11 days late and with rock solid quality—a feat that would not be bested for a decade or more.

From my vantage point, Zero Defects, marked the start of Apps culture of shipping. A culture that included and organization structure to scale development teams, a process to plan and schedule products, techniques to maintain engineering throughput, along with methods for ascertaining quality through the entire development schedule. Excel 3.0 would be proof that projects could be on time and have superb quality. Apps would iterate and hone this process for years to come, but it is neat to have a sense for when it all began.

That is somewhat hindsight. The memo would have been more profound to me if I ever experienced a death march and or worked on a large and complex shipping code base. I had no experience shipping a product, so what did I know? As MelissaB shared her perspective with me, I came to understand what ZD as we called it really meant. There was no system-wide integrity (in an engineering sense) and that the wrong people were writing too much code. Some developers wrote a lot of code to make it look like a feature worked, even if all the boundary conditions weren’t handled or if the code was fat (a Microsoft expression for verbose code that took too much memory or was too slow—a quick reminder that PaulA’s and BillG’s original Microsoft BASIC fit in four kilobytes of memory, or about two pages of this book). Worse, those developers received high praise for getting “so much done.” System-wide, the schedule was used not as a tool to get work done but more as a system to stretch goals without reflecting the complexity and interdependence of the work of the team.

Something MelissaB said to me during one of our lunches and many emails on the subject resonated for decades and proved to be a cornerstone, not only for me but for how the entire Apps division (and later Office then Windows) operated. Her reading of Zero Defects and her own observation was that everyone should be focused on clearly communicating what work they did and be measured by achieving that. No games. No stretch goals. No race to check on things that were not yet done. No doing the minimal work to make a feature demonstrable. Later, we came to call this promise and deliver.

Melissa also connected some dots for me and explained that the way groups were rewarding people who ultimately contributed more bugs than code was in practice rewarding some men and penalizing some women on the team. Teams that were small had few women. There was no hiding that fact. Melissa’s view was that the women were routinely delivering the code they said they would, when they said they would, and at the same time getting feedback about the need to do more. As potentially controversial as such a statement was, it was simple to demonstrate by looking through the schedules and at the RAID database. While Microsoft was just starting to appreciate how different developers worked, Melissa’s explanation of ZD in the context of specific people and their approaches to work (and rewards) made everything far more concrete. The specifics Melissa shared became a rallying cry for me later in my career as a manager and I would often share what she taught me.

Zero bugs every day I mean this literally: your goal should be to have a working, nearly-shippable product every day. This doesn’t mean that when you go home every night you have removed all the bugs from work in progress. It simply means that when a programmer says a feature is complete, it is totally complete: all error and boundary cases work, all interactions with the rest of the product have been dealt with, test documentation or code to exercise the feature are checked in. Your project should have a state or directory from which anyone can create a current “clean” copy of the product. One way to handle this is to stage the checkin. Modules can be checked in to a “working” directory for history and as insurance against hard disk failure, then checked in to the “clean” directory when the feature is complete.
The title of the memo came from this section and this was the start of really making sure the product was shippable every day. Many projects said they would do this, but at the time few were able to really claim success. In Systems this became know, ultimately, as “eating your own dogfood.”

Underlying this was the beginning of the idea of continuous quality. Every day the product should be shippable and of high quality. Work that was not yet completed was not part of the checked in (completed) code, but that code was kept in sync with the main product. This is something analogous to today’s continuous integration and continuous delivery, and it took decades for Microsoft to achieve this level of engineering, which began in a moment of crisis and self-reflection. Although this seems obvious today, software projects were not typically run in this fashion, certainly not PC software.

That’s the long way of explaining that MelissaB’s answer for my performance review question was to “make sure you put in that you will practice Zero Defects in everything you do.” That was a bit cynical, but it worked for us, and everyone. Those were performance reviews circa 1989.

With those goals in hand, we spent the fall of 1989 and winter of 1990 hacking away at ET++ and making it work on OS/2 and Windows. Little sample programs, like a calculator, worked. Along the way we found a lot of bugs in the new version of the Microsoft C compiler. And we continued to test out the programs we created on Windows 3.0.

From my performance review: The concept of Zero Defects has been my guiding principle.
This is what I wrote about Zero Defects in my performance memo. I really laid it on in this review—my “guiding principle” and all. You can see how I basically reverse-engineered the Zero Defects memo into review text. Good grief.

In many ways, those early months were a second ADC or an ADC practicum. The opportunity to be on the ground floor of a new computer language was great, and the added challenge of trying to make it work on a bunch of operating systems that didn’t work only added to the fun and, also, the frustration. I guess I had not really considered that my job might be frustrating. It had not yet occurred to me how truly messy the company was.

Nevertheless, experiencing this while waiting for other groups to finish so we could collaborate seemed better than any alternative. Once performance reviews were complete and thinking about all my friends shipping Windows 3.0, Word 1.0, and Excel 3.0, left little doubt my project was busy work and it was dragging on.

Learning together This isn’t the final word. If you find techniques that help you produce better-quality code, please tell Dave Moore, Doug Klunder, or myself so we can make everyone aware of them.
Also from Zero Defects, in one of the first internal memes (we did not have that word back then) the “no bugs” sign started to make its way around in various forms along with any number of small plastic bugs, but stuff bugs, fly swatters , and any other representation of bugs.

Go on to 007. Windows 3.0 Buzz