041. Scaling the Office Infrastructure and Platform
“I don’t know how to read this without getting concerned.” –BillG replying to an Office96 status report April 1995
Perhaps more than any particular feature in what would become Office 97, though there were a lot of features, the biggest innovation was building the organization and culture supporting a shared infrastructure and platform team. Before Office 97, Microsoft decidedly switched to selling Office, yet we continued to build Word, Excel, and PowerPoint, and we were organized that way. The Office96 product cycle, starting in 1994 (in parallel with what became Office 95) built out the new team, OPU the Office Product Unit, and new approaches to creating shared code and infrastructure. Not only did this come to define the Office product and organization, in many ways it defined my own career and high-order bit.
Please note, this post might be a bit long for email so be sure to click the link to get the full experience.
In shipping at scale, it is not enough to agree on what needs to be done. There also needs to be agreement on how it gets done.
While the big apps were successful in their own right, won reviews, sold incredibly well, had high customer satisfaction, and were made by teams that were exemplars of the MikeMap value system, they evolved with different engineering systems. Differing in detail, they all accomplished the same: shipping high quality (for the era, or at least higher quality than everyone else), while striving for a ready-to-ship product every single day of the process.
To developers and testers, the micro details of how this worked across teams for committing changes, check-in tests, unit testing, localization, and more were highly evolved. Each step in the process was tuned to the “unique” needs of each product’s engineering organization, or perhaps tribe is a better description. Minor differences would be amplified both at scale and across teams. A common example was how much time in the schedule was reserved or buffered for the unknown. Excel with its record of getting closer and closer to on time RTM, could be seen as either extremely hardcore or excessively conservative, but managing the day-to-day progression through the schedule was critical and very much a key part of the culture.
It is a statement about Microsoft culture that the better Excel got at hitting projected ship dates and shipping award-winning products, the more the discussion across Microsoft (outside the Excel team) centered around how conservative the team was getting. To Excel, they were just being hardcore. This was best symbolized by the leather biker jackets developers earned as a ship gift adorned with a Recalc or Die logo. Times were different.
The idea that getting better at daily engineering and hitting scheduled milestones was somehow a sign of being less aggressive or grandiose in plans gets to the heart of the divergence of Microsoft cultures across the company. The Apps teams not only wrote the Zero Defects memo but internalized a cultural attribute of promise and deliver. Much of the rest of Microsoft seemed to have succumbed to the idea that such a process (or philosophy) was somehow less hardcore or even wimpy. There was a strong belief among the over-promise side of the house that building a platform was simply more difficult than building applications (never mind that the applications were also platforms, but I digress) and that there was a real difference in impact if a platform cut features the ways Apps did in order to ship. I can say this because many times when it came to collaborating across the company I was on the receiving end of comments along the lines of “yes, but that’s only because it is just an app.” In my weaker moments I would say the quiet parts aloud, such as “yes, but you’ll never ship.”
Apps, meaning Office, was the more fragile growth engine of the company, and bigger opportunity for profits. Office depended on customers choosing to buy Office over an existing product they already owned or competitor, and that decision benefitted from a new version of Office drawing interest to new capabilities and over time would come to depend on much more profitable corporate deals (as we will learn in the next chapter). Windows, on the other hand, was going out on most every new PC (at a much lower price than Office but a much higher attach to a PC). Whether an updated version of Windows was on the PC or not, PC sales were going up primarily due to businesses buying first PCs for many employees, and in a bit of a twist those customers often preferred the current or even previous version of Windows anyway. There was certainly a pop that came from a new Windows, especially when timed with updated PCs for back to school or holiday, but no one was confused over the revenue drivers. These differences in the business models directly led to the variation (and tensions) in development processes, and also to the differences in how each business evolved and innovated well into the future. We were all products of our environment.
As an example, within Office each program management team (Word, Excel, PowerPoint) developed unique approaches to overall design and feature selection. When there were differences in design or prioritization the discussion would inevitably turn to a claim along the lines of “Excel users are different [than Word or PowerPoint users].” Each team was focused on ease of use, or what we often called “basic use,” yet maintained a different idea of the prototypical personas using the product.
Putting these together, there were three tensions at play in building Office96:
Enlisting support across executives for an overall plan. The normal process of each formerly Business Unit then Product Unit doing this on their own no longer sufficed.
Developing a plan with buy-in from the dev managers and test managers for how Office96 was to be built—the tooling, day-to-day dev process, and the overall testing and verification, through to localization.
Deciding what to build that represented a suite while continuing to recognize that to the outside world, customers and press, the category battles might not be yesterday’s news even if the Microsoft strategy was all Office.
I wish there was a lot of a story to tell about how this played out, but in reality, “decisions were made” in a bottom-up or distributed manner. The rest was going to be in execution. The Office Product Unit was formed while the product plans were created in parallel, thus much of Office96 would be characterized by OPU and the Apps in a state of tension over planning and execution. Ultimately, this made for a bumpy Office96 release filled with many new execution challenges, but it also built the foundation for an execution machinery that would become unprecedented and largely responsible for what ultimately became the largest and most secure business at Microsoft.
The Office96 plan had two main pillars:
The Apps product units embarked on deep, category-defining features, continuing to make inroads against legacy MS-DOS competitors and to win against Windows competitors. At the outset the suite included Word, Excel, and PowerPoint within the DAD organization and Access in the Tools group. These products underwent significant architectural work consistent with a full 24-month schedule. The initial plan was to continue to ship Mail and Schedule+, though this would change completely as we will see.
The Office Product Unit built a set of features shared across the apps and then they integrated those features into one or more apps (this is a key tenet about creating shared infrastructure), leaving the other apps to do integration work on their own. In addition, OPU would, by nature of code and also influence, make sure the suite was designed for consistency and integration across the apps.
The OPU features ranged from a lot of heavy lifting, but straight-forward, to some of the most sophisticated refitting of features envisioned by the apps. In contemporary terms, OPU was both an infrastructure team and a platform team. In terms of infrastructure, OPU drove a new shared engineering and quality process (led by JonDe and GrantG) and created shared components essentially representing a platform upon which to build Office applications, providing the code (APIs) for many common application paradigms across user interface, text handling, graphics and drawing, and much more.
As a successful product engineering team scales and a product line grows, there is an inevitable desire to gain efficiencies of engineering scale and an ability to expand the product line efficiently. This all sounds perfectly reasonable until you realize doing any of this runs strongly counter to the very forces that got the teams to success in the first place. Changing processes sounds risky when it took so much work to get to the current state. Sharing code always sounds much more difficult than not sharing code.
Sharing code always means either replacing something that already exists in a winning product with new code from someone else or adding code that does not completely and fully understand the unique needs of the winning product or its customers. As is almost always the case, the shared code is viewed as bloated, overly complex, or simply does more than needed. Despite the recent success in using shared open source code, the more established a product is the less likely it is to see code from the outside as a preferred path. In 1996, it was always about performance, memory management, or simply complexity. The technical buzzsaw would evolve to include security, manageability, and even privacy/safety—the reasons might change but the goal of avoiding shared code remained. Shared code is a way of ceding your autonomy to another group. Developers have traditionally maintained an attitude of NIH, not invented here, as shorthand for the distrust of OPC, other people’s code.
A a note, startups today love code often extolling the value of Open Source as a way of achieving a good deal in short order. Generally as we’ve seen to date, with success such reception to outside code is tempered.
The benefits to sharing are enormous, and that is what leads teams to take on these challenges. If a product team can create infrastructure and platform assets, then more engineers can focus on category-specific work while also making it easy to add entirely new products to the business with substantially less effort. Office had Word, Excel, PowerPoint, and now Access, but the world of productivity software was vast and it made no sense not to try our hand at personal information management, drawing, note-taking, project management, desktop publishing, or a host of new categories. OPU would be a key part of how to scale both out and over in productivity, and Office96 would be our collective growing pains.
To best illustrate this, let’s look at some of the specifics of what OPU did. The diversity, breadth, and frankly aggressiveness is due to JonDe and his engineering leadership that pushed to do a lot in the first release of shared code out of the gate in early 1994 (a few months before I joined the team). The body of code was packaged in a Windows DLL (dynamically linked library, a Windows mechanism for packaging executable code to be shared, and also the source of endless frustration in the world known as DLL Hell, but I digress though will return to this topic soon enough). The DLL file was ultimately named MSO97.DLL, though sometimes called mee-so (for MS Office) in conversation. Along with MSO, there were a few other files as well as a test harness that could exercise many of the capabilities called Lime. Lime would grow over time and eventually prove out just how much of a platform we were building.
MSO contained code designed to be shared across applications, bringing with that engineering efficiency, experience consistency, higher quality (doing something once brings that), improved performance, and even more features because generally that is what happened with a dedicated effort.
Features were the currency of Apps teams. Features defined contributions. The more visible and customer facing, generally speaking the better. Therefore it was important for OPU to have its own features, not just be a dumping ground for the grunge work that the big teams traditionally farmed out or de-prioritized. An example of this was Setup, the code that copied bits from floppies to harddrives. Almost always getting this done was a last minute sprint and shunted to new hires or even contractors. Apps teams were more than happy to have OPU take this over (without giving up any resources of course). Creating OPU was not going to go that way, so the portfolio consisted of a fair share (or more) of grunge, cool features, and even an app of its own, the Binder. This type of portfolio was critical to the successful creation of an OPU team and culture, giving it an identity beyond simply the plumbing team, so to speak.
Over time and several releases, MSO would be viewed by the entire organizations not as a tax or effort bolted on the side, but as an asset and more importantly a starting point and platform. The journey of building the Office platform would start with the tension and difficulty described herein, and end with new features defaulting to shared efforts, new apps spinning up quickly with MSO, and the organization finding a balance between platform-infrastructure, and category-specific innovation.
Every product (or even organization) at scale finds itself at some point of the swinging pendulum of centralized versus distributed efforts. Often this is viewed through the lens of what is good for the broader business, but at each end of the pendulum is an on-the-ground view of challenge. These views are as predictable as the broader swings. When moving from a distributed to centralized effort (or resources), the formerly distributed accountability will find every reason to doubt the capabilities and necessity, and ultimately viability, of a centralized effort. Over time, the same people and organization comes to rely heavily on the shared team and actively pushes work to centralized efforts.
This dynamic characterized most everything in OPU.
In the work I do with companies today, the topic of scaling, sharing, and building new products efficiently over time is one of the most popular lessons I have the opportunity to share. My own experience was a journey of a career of scaling, sharing, and collaborating, occupied the next 15 years of work. We spent a good deal of time in 2000 describing some of this for a Harvard Business School case, which for many years was used to teach a combination of customer-informed product development and shifting an organization to sharing (see Microsoft Office 2000, MacCormack and Herman).
At the sophisticated end of the platform features was a shared drawing layer, code named Escher. The Microsoft art collection, which had a significant job to do to fill the reception areas and lobbies with something, mostly featured Northwest contemporary artists but had one original M. C. Escher hanging in the building 17 atrium. The acquisition of that was championed by Art Committee member and first Office vice president, ChrisP. Sometimes even Apps had cute code names.
Escher was a big effort spanning all the apps and especially PowerPoint, where much of the lower-level graphics code would be implemented. The integration of Escher into Word was done by a new OPU team, staffed with developers from across Apps. Having engineers that had worked in each of the apps code bases was critical to building shared code to work across those products (again these were the massive products and code bases of Word, Excel, and PowerPoint) and a key decision JonDe made in staffing the teams.
Like all of the new shared features, there was a constant debate across the Apps teams and OPU as to the value of the feature for each team. OPU was in a constant state of selling the value of shared code and the idea that sharing enables teams to get more than they might need, basically for free. Except in practice nothing was free as each app inherited compatibility and complexity that it decided it did not need. Drawing was a great example of that, but it was not even the most controversial.
Every app in Office had some support for drawing, but none were particularly deep and all seemed to serve category specific use cases. Word was able to embed drawings as regions within a document, much like a photo, which is how most people thought about adding illustrations to business documents, if they could draw. Business memos and other documents using simple drawings that could float on top of a document, much like an acetate layer, greatly enriched documents and were recently added to Word but still relatively limited. Even more exciting was the ability to use broadly the fancy text that became known as WordArt, which was new but constrained, as with drawings, to be embedded in regions and not used arbitrarily throughout a document. The complexity of creating feature-rich and deeply integrated drawing tools was daunting in Word. To mitigate this, the lead engineer, Peter Engrav (PeterEn), volunteered to lead the integration of Escher into Word from within the new OPU. A key tool for managing the shared features was that OPU would lead the integration into one of the main apps, thereby learning firsthand the complexity and also minimizing the work to the app.
Excel had elaborate tools for charting (candlestick, donut, 2.5 dimensional, etc.) and some minimal tools for doing callouts and basic graphics on sheets. There was a great deal of resistance to features that were deemed “not something Excel users requested” or even features that were viewed as less than professional or business-like, whether that resistance was genuine or simply a sort of buzzsaw didn’t really matter. The Escher team constantly received inbound “doubt” over any features simply from the perspective of it not being interesting to Excel users.
At the other end of the need spectrum was PowerPoint, which was basically a big drawing program. Why would a drawing program want to use a shared code base, as that was their entire domain? As though to emphasize the maximum complexity of sharing code across these apps, PowerPoint’s main concern was that Escher wasn’t enough for them competitively, simply because they were spending time putting drawing in Word and Excel—neither of which appreciated drawing as much.
See how that worked? That’s the “middle” that OPU found itself in as a platform team for already successful products.
Escher would go through many rounds of adding features for compatibility with what was there and removing features because of schedule constraints, along with challenging debates over features versus integration. The end-result, however, was a tsunami of graphics features across the product. Every product picked up integrated capabilities previously found only in high-end and rarely-owned professional tools including drawing shapes, modern graphics files formats including transparency, photo handling, shading, animated GIFs (like the best viewed in Internet Explorer logo on all the HTML files we created) and even an integrated and vastly richer variation of WordArt, the curvy, glowing, bubble-text so popular with grade school children and small business signs. A huge part of Escher was that much of the shared work was also done from within the PowerPoint team itself. PowerPoint was also located in Silicon Valley and this was the first time we had embarked on sim-shipping deeply integrated code across a plane flight.
While the debate over Escher was intense, the debate over the core or primary user interaction (meaning the user interface) in apps was even more so. The core user interaction in Office took place through toolbars, which were a primary source of app innovation—so much so that the image on the box and most screenshots in the press were of the toolbars. In an effort to build a suite, one sold with a value proposition of consistency and muscle memory, it was only natural that we tried to share toolbars—do them right, do them once as Lotus claimed to do.
In modern context, this might seem trivial, but at the time this was a key innovation. With the different teams on different schedules, but with a shared DNA and understanding of potential solutions, it was no surprise that there was some common evolution, along with opportunities to be a little bit better, or different, depending on perspective. Toolbars proved legendary in this regard.
One of the first battles we found ourselves in was over the design of toolbars. Word and Excel had each designed and tested their own toolbar implementation and arrived at different heights—15 versus 16 pixels. Trivial to mention, but research done separately by Word and Excel, surprisingly, showed that Excel and Word users had different preferences—obviously due to test design or some other factor since it is ludicrous to think this differed by app. This might not have mattered except that the main marketing demonstration of Office showed Excel embedded charts within Word. Clicking on the chart loaded the Excel toolbars and caused a one-pixel shift in the document. As if that weren’t enough, there were equally divergent views over the design of the tooltip, the little text that appeared when the mouse was held over a button explaining what the icon might be. This invention had those that believed the tips should be white and those who fought for yellow, not to mention debates over the delay, stickiness, amount of text, whether the keyboard shortcut should be there, and whether there was a choice to disable them. Even the simple features, no matter how new and clever, were impossibly difficult to coordinate.
Ever the diplomat, Andrew Kwatinetz (AndrewK) spent the better part of the product cycle ironing out, negotiating, and pleading consistency across the the products. Andrew was already deeply experienced, as an intern and college hire, in both Word and Excel user interface design and had already proven himself to be one of the next-generation leaders of OPU. Early in the product cycle, Andrew sketched out all the places across the product that lacked consistency and coherency as an original volunteer in the newly formed OPU (and its prior form, the Apps Interoperability Group), and he had begun to map out plans to bring the product together and innovate in user experience.
Having committed to sharing the code, we finally had in one place all of the buttons, menus, and commands for all of Office—thousands of entries in a single place. Pete Morcos (PeteMor), a recent college hire, arduously managed them all by maintaining a database of every icon, command name, tooltip, menu string, status bar text, and keyboard accelerator in the product. The difficulty and attention to detail required was only matched by the long-term value for consistency, localization, user assistance, and most of all ease of use. One of the most significant differences between Office and most other tools, even today, was the sheer breadth and simultaneous depth of features, something that would become even more apparent as web pages came to the forefront. Each application had over 1000 commands (buttons, menus, etc.) with something over 2500 unique commands in Office96. The scope of the product would be further amplified by the platform APIs available through Visual Basic for Applications, another major shared effort that enabled developers to build custom applications based on Office.
The sharing was enormously difficult, taking a toll on the OPU team and frustrating the Apps teams. We were a year or so into the project and, while we were clearly making progress, we were also moving more slowly than we needed to. We had not taken the time to adapt the organization to sharing nor did we really consider the breadth of the undertaking.
The team was so frustrated that JonDe and I decided to have a meeting with VP ChrisP and SVP PeteH to discuss “the situation.” It was a combination of us asking for help and us being called to the carpet for the situation bubbling up to them from the Apps teams.
My own memory refers to this meeting as the one when JonDe said, “People think Jon has lost his marbles” and thus I recall the meeting as the marbles meeting.
I had put together slides with some basic philosophical problems we had been dealing with across primarily OPU, Word, and Excel. There was nothing really new in the deck. I had previously sent a couple of really long emails basically warning that things were challenging, and progress was slow. PeteH was hearing the other side of this from the Apps leaders—how things were slow because of OPU’s shared code that wasn’t needed and features they didn’t want slowing things down, making the products bigger and slower, while taking time away from doing features that could win customers and reviews. I’m not exaggerating.
The key moment in the meeting was when JonDe explained how crazy things had become. A year earlier, Jon was leading the Excel team through a hugely successful release and prior to that he had been a key leader for the entire history of the product. He epitomized everything about DAD culture. Yet all these developers that idolized and looked up to him suddenly believed he’d lost his mind and somehow gone crazy, drunk off the Kool-Aid of shared code. His old team stopped believing his schedule estimates or even architectural approaches. Jon had clearly “lost his marbles”. We were deadlocked by the “Word users are different,” “Excel users are different,” and “OPU is wasteful” mindsets.
We vented and PeteH listened. Still, it felt like there was not going to be any immediate change. I had hoped they would do something simple like send mail to everyone saying to listen to us and this is the way it is. In hindsight, that was desperate and totally the wrong way to solve the issues, but our frustration levels were intolerably high.
Somehow, things did change, though. PeteH and ChrisP worked quietly in the background doing more to reinforce both the strategy and execution of Office96, the focus on shared code, the consistent experience, and the notion of one team working together to make Office. This happened in all the right ways through mostly small or 1:1 meetings. That was the DAD culture. Pete was savvy enough to know the team would not react positively to some sort of commandment or over-the-top edict about sharing. The subtle persuasion and repetition were what the team needed and got. Eventually, the Apps leaders were reaching out more, and over the following weeks we saw the how the climate changed.
Our view was that BillG would be quite proud of the sharing. We thought for sure the idea that breaking down the barriers between apps and improving the architecture of everything would be viewed extremely positively.
The product was still impossibly difficult to run, though we had stable daily builds due to sheer force of will from JonDe, GrantG, and the development and test managers, but there was two years of work ahead as things started feeling better. Even with bumps on the road ahead, I was feeling good about it all.
Every month, I gathered up the status from across the project for an email report. Each team (Word, Excel, and PowerPoint as well as Escher and all the OPU contributions) contributed a section with information on progress—the PDL, or product development list, in reference to the spreadsheet of all active projects. The individual apps also created PDL reports, even though our goal was a single product release. There were two important items to cover. First was the project on time. Office96 appeared to be generally running on time, at this point, so the update was benign though unknown to us we were also naively optimistic.
The second part of this update included the process that was near and dear to DAD, which was adds/cuts. Throughout the development, particularly after an eight- or ten-week milestone, each team, at a granular level (individual developer), reevaluated the list of work items (tasks taking about a day of development or so) and considered the progress made versus progress required. The result was almost always feature cuts—removing proposed features from the product. There was also learning along the way. There were also adds: enhancements, new options, or reworked features. JeffH had always taught me that transparency and completeness were critical to how BillG thought, so my PDLs were works of art in those attributes. I worked super hard to bring the product to life with some clarity.
Upon receiving one PDL for Word, BillG replied to let us know two things:
First, we were cutting too much. “the number of cuts is truly amazing to me” he asked in red text. In fact, in the effort to be honest, my PDL looked like we were gutting the product every month. That was not the case. In DAD, the basic approach was cutting is shipping, so in order to ship we would scale back features as we learned more. That was how the process worked and everyone was comfortable with that, at least within DAD.
I felt horrible for the team and certain the email would result in people quitting and Apps using it as a chance to say, “OPU was a bad idea.” That concern was followed by worry that I was going to get fired. Were we on a path to a bad product? Was I leading us in the wrong direction? Was I messing up? Or perhaps this was all a communication problem.
Separately, Bill chose to highlight some specific features that he felt strongly about. This inadvertently (honest, it was unintentional) allowed the Apps teams to say that the OPU efforts and resources were robbing the apps of features that Bill would prioritize. Ugh.
The right thing to do was to show BillG some progress, but without the ceremony of a full review.
We were early in the development of Office96 and most features were merely crawling. Most of us were not running the product on a daily basis as it was not ready (called self-hosting), and certainly it was neither ready for BillG to use nor was it a polished demo. Nevertheless, I took it upon myself to set up some time and march over to BillG’s office with my laptop, talk through the PDL, and show him some carefully curated features. It was a risk, but so was debating in email or letting the issue fester.
It was a quick 30-minute “drive by,” and one of many that I routinely did over the years. I made clear I was not showing features specific to Word, Excel, or PowerPoint—the dynamics of the DAD organization would not have looked kindly upon that as I had no responsibility for those features. Rather my goal was to put Bill at ease over the investments of shared features. I showed off the toolbars (called command bars as they brought unification of both toolbars and menus), Escher drawing, highlighting the depth of the work we were doing. This was enough to put him at ease for the time being. It was a good lesson on how the verbose nature of the email status report was mostly undermining the goal of showing off progress.
The demo went so well that we separately held another demo session at the end of the milestone. In this one we filled the room with members of each team and everyone got to show off the work of their team. It also served as a reminder that while there were plenty of shared features, a large part of value of the release came from the domain or app-specific features across Word, Excel, and PowerPoint. We had a new saying now, which became “we’re selling Office, now we’re making Office, but people use the individual apps”.
After the meeting, I nudged Bill to send a nice note summarizing what he saw and served to solidify the progress we were making across the team and undo some of the earlier nonsense.
And it was a good lesson that working software beats a status report. Onward.