I loved those sandwich guys; it totally made their day to construct me something new each lunchtime. Once they slapped mustard between two slices and handed it over, laughing their heads off. It was awesome.
There is one important story from the offsite meeting that created the ZD memo. Chris Mason agreed to write the memo that was the work of the group and we were all OK with that because Chris was the person who created the breakthrough. We were basically stuck, kind of pointing fingers and/or throwing up our hands about the problem until Chris broke the logjam by saying something to the effect, "Here is what I think I do that contributes to the problem." That allowed all of us to start being at least a little honest about what the negative factors were. That's how we got to the statements about the wrong person writes too much code and other personal responsibility factors expressed in the memo. Chris' decision to be self-critical was good leadership and taught us all an important lesson.
This is amazing. In my career it became pretty clear that industrial grade software development did not happen at computer manufacturers and significant improvements seem to come in enterprises and their efforts to have stable, reliable systems. When software became the business, the need for an engineering discipline was also up against some serious mythology. The ZD pursuit is very impressive.
We were developing Windows apps starting with Windows 1.0 on PCs. There were no good Windows debug tools available from Microsoft at the time. To be able to debug we wrote some assembly code that would output text to a secondary EGA (or VGA?) monitor. So on the main monitor you would see your windows app running and on the secondary monitor you would see debug information in real time. The Windows app source code would have conditional if debug PrintDebug statements in it. We got rid of a lot of bugs that way.
The MS Apps teams used the system debugger to accomplish this. The debug Windows kernel in the SDK had a reasonable amount of diagnostic info for API errors and leaked system objects. All of the communication went through the serial port to a z19 terminal or equivalent. We had switch boxes that allowed us to connect the terminal to different PCs or Macs based on which system was the target.
A lot of debug code used the serial port. All system APIs were wrapped so that we could trace OS activity. This was very helpful when it came to complex window message interactions, which proliferated with DDE and OLE embeddings.
The Excel memory manager was extensively instrumented and debug menus allowed memory maps and key structures to be dumped to the terminal. Later the debuggers had this functionality but in the early days we only had the assembly level system debuggers available that did not have extensive symbols.
Its fascinating to read the Word post mortem in this day and age. I glanced over the the table with the roster of people involved and my first reaction was that must be the folks who got to do the post mortem. Nope... that's everyone who worked on the product. Ever.
Did you find that BillG, SteveB and others at the top would actively push this Zero Defects approach as well? Or did the Memo start at the mid-high level and then became like a ‘spiritual’ document amongst the workers. Always interesting to see if the upper management also lived by the same principles and/or they (like BillG) helped convey the value of the Zero Defects memo..
It was taken very seriously in the apps group, but not at all seriously across the company. The practice of the development leaders meeting and reflecting on what went right/wrong on a project (post-mortem) was was universal but ZD added another layer that reinforced the post-mortem practice and opened the door to systemic thinking vs the typical thinking that this person or piece of technology was good or bad. In apps we took the idea of renewal of practices between projects seriously and is a major reason we were able to scale successfully to thousands of engineers and a product as large and complex as Office. Also, I don't believe we suffered any outright project failures after this point, although delays and death marches would at a lower level would come and go as complexity temporarily outstripped engineering practices in use.
What's shocking to me is just how long it took for the Systems group to really operate in the zero-defects mode. I credit you and Steven for bringing in that mentality post Vista. Any thoughts on why it took so long? It feels like NT had their own equivalent of zero-defects up until NT 4 and then it was lost as the team grew.
Different cultures. I will dive into this quite a bit. It is easy to saying one was wrong or one was better, but the root of the cultures comes from the different needs to create the business. Systems had to build an ecosystem from scratch when all the partners had alternatives (why make devices and drivers for MS-DOS when CP/M, Tandy, etc. are out there?). Apps had to win over customers who were entrenched with successful products (like 1-2-3 and WP).
It depended on what the management team valued. If you read the ZD memo you see explicitly mentioned that management support is necessary and that MikeMap (Mike Maples) will give it. A very important part of Engineering Excellence (much later in the timeline) was this: Making sure the VPs and GMs knew they were as responsible for how the work was done as what work was done.
I loved those sandwich guys; it totally made their day to construct me something new each lunchtime. Once they slapped mustard between two slices and handed it over, laughing their heads off. It was awesome.
It was very you! And not me at all!
There is one important story from the offsite meeting that created the ZD memo. Chris Mason agreed to write the memo that was the work of the group and we were all OK with that because Chris was the person who created the breakthrough. We were basically stuck, kind of pointing fingers and/or throwing up our hands about the problem until Chris broke the logjam by saying something to the effect, "Here is what I think I do that contributes to the problem." That allowed all of us to start being at least a little honest about what the negative factors were. That's how we got to the statements about the wrong person writes too much code and other personal responsibility factors expressed in the memo. Chris' decision to be self-critical was good leadership and taught us all an important lesson.
Super cool. A lot of early off sites seem to have been characterized by one person “pounding the table” and creating a breakthrough moment.
This is amazing. In my career it became pretty clear that industrial grade software development did not happen at computer manufacturers and significant improvements seem to come in enterprises and their efforts to have stable, reliable systems. When software became the business, the need for an engineering discipline was also up against some serious mythology. The ZD pursuit is very impressive.
We were developing Windows apps starting with Windows 1.0 on PCs. There were no good Windows debug tools available from Microsoft at the time. To be able to debug we wrote some assembly code that would output text to a secondary EGA (or VGA?) monitor. So on the main monitor you would see your windows app running and on the secondary monitor you would see debug information in real time. The Windows app source code would have conditional if debug PrintDebug statements in it. We got rid of a lot of bugs that way.
The MS Apps teams used the system debugger to accomplish this. The debug Windows kernel in the SDK had a reasonable amount of diagnostic info for API errors and leaked system objects. All of the communication went through the serial port to a z19 terminal or equivalent. We had switch boxes that allowed us to connect the terminal to different PCs or Macs based on which system was the target.
A lot of debug code used the serial port. All system APIs were wrapped so that we could trace OS activity. This was very helpful when it came to complex window message interactions, which proliferated with DDE and OLE embeddings.
The Excel memory manager was extensively instrumented and debug menus allowed memory maps and key structures to be dumped to the terminal. Later the debuggers had this functionality but in the early days we only had the assembly level system debuggers available that did not have extensive symbols.
Its fascinating to read the Word post mortem in this day and age. I glanced over the the table with the roster of people involved and my first reaction was that must be the folks who got to do the post mortem. Nope... that's everyone who worked on the product. Ever.
Ha! The post mortems are amazing. There's a whole collection of those! It was a big part of shipping in Apps.
Did you find that BillG, SteveB and others at the top would actively push this Zero Defects approach as well? Or did the Memo start at the mid-high level and then became like a ‘spiritual’ document amongst the workers. Always interesting to see if the upper management also lived by the same principles and/or they (like BillG) helped convey the value of the Zero Defects memo..
It was taken very seriously in the apps group, but not at all seriously across the company. The practice of the development leaders meeting and reflecting on what went right/wrong on a project (post-mortem) was was universal but ZD added another layer that reinforced the post-mortem practice and opened the door to systemic thinking vs the typical thinking that this person or piece of technology was good or bad. In apps we took the idea of renewal of practices between projects seriously and is a major reason we were able to scale successfully to thousands of engineers and a product as large and complex as Office. Also, I don't believe we suffered any outright project failures after this point, although delays and death marches would at a lower level would come and go as complexity temporarily outstripped engineering practices in use.
What's shocking to me is just how long it took for the Systems group to really operate in the zero-defects mode. I credit you and Steven for bringing in that mentality post Vista. Any thoughts on why it took so long? It feels like NT had their own equivalent of zero-defects up until NT 4 and then it was lost as the team grew.
Different cultures. I will dive into this quite a bit. It is easy to saying one was wrong or one was better, but the root of the cultures comes from the different needs to create the business. Systems had to build an ecosystem from scratch when all the partners had alternatives (why make devices and drivers for MS-DOS when CP/M, Tandy, etc. are out there?). Apps had to win over customers who were entrenched with successful products (like 1-2-3 and WP).
Thank you for your kind words!
It depended on what the management team valued. If you read the ZD memo you see explicitly mentioned that management support is necessary and that MikeMap (Mike Maples) will give it. A very important part of Engineering Excellence (much later in the timeline) was this: Making sure the VPs and GMs knew they were as responsible for how the work was done as what work was done.