Discover more from Hardcore Software by Steven Sinofsky
051. HTML: Opportunity, Disruption, or Wedge
“Allowing Office documents to be rendered very well by other peoples browsers is one of the most destructive things we could do to the company.” —Email from BillG
While we knew the time was wrong to build a whole new Office out of one of the new disruptive technologies, we did need to arrive at a strategy for HTML. After the debacle of the file format changes for Office 97, the allure of HTML was everywhere. The enterprise customers we intended to impress were fed up with the traditional (and ever-changing) binary file formats in Office. HTML had achieved the status of “magic beans” and could solve any (and all) problems. But how?
Back to 050. The Team’s Plan in the Face of Disruption
First thing Saturday, 9:44AM December 5, 1998, so might as well have been 5:44AM, I received an unsolicited mail from BillG subject line “Office rendering” and copying the full management chain in case they weren’t busy that morning. I joke, this was normal even though Bill had been increasingly focused on broader issues lately. For some reason (no context provided), Bill read something somewhere that gave him concerns about the use of HTML in Office. We already had a plan, which he knew about but was now having second thoughts or something. It was not unusual to have to back up and go through our logic and approach to get to an admission that we were not nuts, or perhaps to tweak something. This was a big issue though and cut to the core of our second highest priority in Office9, “HTML in Office9”. Bill had some concerns, to put it mildly.
When we changed the traditional binary file formats in Office 97, we caused a real disruption in work. Suddenly files were being emailed within and outside the company and they simply couldn’t be opened if the recipient had the wrong version of the file. Worse, even if a converter was available there was a pretty good chance after a few edits the document returned in email would look funny or even wrong. What had served our industry so well—the binary format that represented the internal data structures of each application—had hit a wall with the combination of email and slow deployment of the latest version of software. Without new file formats we were really stuck because every new feature was represented as a change in the data structures and file formats. That’s just how things were done.
The browser and HTML were the cool new thing, but they also held out a promise of a universal platform for viewing documents. Enterprise customers and industry analysts were enamored with the idea that HTML was a resilient, text-based format. If people used different browsers, they could still read documents with just a few formatting hiccups and all they needed was a browser and not some expensive new version of a productivity tool. Plus, everything just seemed better in a browser, better than the old File Open dialog, connecting to network drives, endlessly navigating folders in hopes of finding something. Just click on a cool blue link and up pops the most current sales numbers or marketing plan. Little details like being connected to a high-speed internet from a laptop, something that was nearly impossible outside large office buildings in major cities, would take years or a decade, to address.
We knew all this. Solving these problems was our plan. The big problem? No one knew how those cool documents that were so easy to read, so resilient, so friendly, so snappy, and so much better could be made. What tools would typical knowledge workers use to create web pages? What server would they be stored on? Office. FrontPage.
Even if we had some ideas, there were many questions about the role of Word and PowerPoint, and to some extent Excel, given the increasing preeminence of browsing. In a world of browsing web pages created with the relatively simple formats in HTML, where and how would tools designed for sophisticated print-formatted documents fit in?
The most complex cross-group feature of Office9 was the second pillar of the vision, HTML file creation. Demonstrating that these large apps could be relevant in the face of the WWW was a major part of our strategic challenge, especially within Microsoft.
Our choice not to do browser Office, Java Office, or components of Office was a big miss for many in the company who saw those technologies as synonymous with embracing the WWW. The answer in the vision (and in High Hopes) was using Office to participate in the WWW, using the apps to create HTML documents that could be viewed in the browser. In a sense, this was turning the WWW into a giant online printer or document repository for businesses. FrontPage powered this ability to publish documents from the desktop PC—we called it the two-way web.1
A response to potentially disruptive technologies is for the incumbent product to do a bit of a jiu-jitsu move and attempt to turn the disruptive technology into a feature within the product—rather than build a whole new product out of the new technology, embrace the technology as part of the existing product. That’s what we were doing with HTML. Rather than rewrite Office in HTML, what if we made HTML a feature of Office? Strategically, some might view this as defensive and certainly not as dramatic as turning the existing business inside out or upside down, as championed by some. The bet was as theoretically cool as Office in HTML might be, we were a long way off from browsers being able to do that. Even with Internet Explorer rapidly gaining share and Netscape seemingly unraveling, we were years from Internet Explorer dedicating their efforts to building productivity tools like Office on the browser platform.
Our Office Advisory Council was extremely positive about HTML. There was a huge wave of effort at large companies (the OAC represented over one million desktops) surrounding questions about how to use the browser for intranets for collaboration and document sharing (like our http://officeweb). OAC members loved the idea of having a focus area beyond deployment and administration aimed at LORGs because it gave them a seat at the strategy table, not only operational efforts. They were excited to be evangelists of creating documents with Office that could be viewed in a browser and easily saw potential for solving their document distribution challenges.
To make this work, we needed to do a crazy amount of work to twist HTML into representing as much of Office capabilities as we could, no easy task. We knew ultimately our goal was to move to HTML as a fully native file format, meaning it could be used in place of the de facto standard .DOC, .XLS, and .PPT. The capabilities of HTML were rather spartan compared to Office and difficult to work with. HTML was designed for minimal online documents in a browser. Office handled the myriad capabilities for print documents and sophisticated online presentation. Myriad is an understatement. Typically, people think of Office in terms of document formatting commands, but that leaves little room for even basic formulas in Excel, or presentation template semantics in PowerPoint, or even the simplest of page footnotes in Word (just to name a couple of examples out of thousands I could list). All of those would need some representation in HTML as well.
This is where my view diverged with BillG’s view. Having gone through the painful Office 97 transition and not wanting a repeat, I saw our file formats as a liability—something to mitigate. BillG saw them as a significant asset, a proprietary asset, and he loved proprietary assets. File formats raised the switching costs for customers who would move to a competitor. He was right. Unfortunately, our biggest competitor was our old version and what we were inadvertently doing was raising the barrier for customers to upgrade (also known as buy and deploy) the new version of Office.
While we often had disagreements, we didn’t often see things as so starkly opposite. BillG had historically focused on those proprietary levers because that’s how the industry grew up. All products were open because everyone was building a platform, while at the same time those points of openness were “protected” by proprietary defenses. An API, user interface, data formats, programming language, were all combinations of wide-open platforms and proprietary elements.
When the topic of HTML as a file format came up in strategic conversations, especially with BillG, the discussion quickly turned to a view that HTML implied ceding strategic control of file formats to either a competitor or to what might become a standards body—that was the worst of all outcomes. Seeing this type of situation, as an advantage or risk, was something BillG was always good at. While I might have personally recoiled at the idea of having something proprietary simply to have those control points in the product, it was not just good business, but the kind of business routinely practiced across technology. In an era of open source, proprietary innovation is often viewed as old school or passé, when in fact it is more vibrant than ever (behind today’s cloud is all open-source software made proprietary by remaining in data centers).
The debate for Office9 was far more grounded because the current state of the art for HTML was still limited. The most recent innovation in HTML was the addition of tables, enabling many scenarios, such as presenting financial data in a spreadsheet fashion, though they still lacked most of the formatting used in Excel. More interesting was a great lesson for me in how rapidly new technologies diffused when there was an incredibly strong demand to improve them immediately. Every content website, those trying to show stories like a newspaper or magazine, struggled with the most basic formatting problems while trying to get something, anything, that looked reasonably professional to display in a browser in a reliable way. With the original specification of HTML, most sites looked a bit like ransom notes—lots of colors, font sizes, bullets, and that awful blinking text. That’s all that people had to work with.
Tables were designed for presenting data, tabular data. Quickly, web developers realized tables were perfect for placing text on the screen in precise spots. They could be used to create columns like a newspaper, or place photos such that text wraps around them, or even nifty tricks like headlines that span columns. Suddenly, sites were using tables to make fancy documents. These did not look like tables one might see in a statistics book or financial report with bordered rows and columns, but what could be seen were aligned text, spanning headlines, and images with wrapped text. Many web purists were troubled by this because the purity of HTML was lost. In practice, this abuse of HTML, as I called it, also made it difficult to realize many of the benefits of the web like processing documents on servers, automatically generating documents, or even searching and indexing documents as Yahoo was doing. Tables made HTML more complex, but they also made sites look great in the browser. In a sense, HTML was evolving to be a complex file format tuned to online documents. Not what the original creators intended, but it was the browser makers who started calling the shots.
Tables were an opportunity for us. They made web pages more complex, making things more difficult for everyone but nicer for humans reading pages. Office tools were perfectly tuned to handling complex user interactions to create nice-looking documents, which could be seamlessly represented in HTML all while editing in Word as one normally did.
Much to the dismay of purists, HTML was quickly becoming an implementation detail that few humans would deal with directly. Computers could easily absorb the complexity while the human just worked with a tool. Office was a great tool. We received many requests to convert Office documents into HTML documents, especially from small businesses and students. While HTML was originally simple to use, perhaps even a bit like using WordPerfect because of tags or codes, as WordPerfect called them, doing anything one could do in Office was impossible. Professionals were using tables and creating increasingly difficult-to-code sites. There was room for Office to make this easy.
Our strategy was to make the most we could of HTML to publish documents, essentially thinking of HTML as an online print command. Program management fanned out across the products to map formatting capabilities of each product to capabilities of HTML. At one end of the spectrum was PowerPoint, which could always save an entire slide as a single image, as was done by early third-party tools. For PowerPoint, though, this lost out on animations, scaling or selecting text, and the richness of the tool. Tables were a perfect way to maintain the layout of slides in a way that worked much better with browsers. PowerPoint had an early start on converting to HTML and was already pushing the limits of what could be rendered in a browser, but it was fantastic. Slides that acted like real web pages where you could select text, resize the window and scale the slide, and even use full screen presentation view.
At the other end of the spectrum, Excel saw little value. Was this another case of “Excel users are different” and trying to foist a one-size-fits-all OPU consistency on to Excel, or was there real utility? One look at what was being published on early websites and one could see the same type of needs that the print world saw, which was that Excel was used to create charts, graphs, and numeric tables that were then incorporated into Word documents. Some of the first uses of the WWW were sharing corporate financial filings, tables of income statements, and balance sheets. While we could push all this work to Word with copy/paste, most of the time that rendered Excel tables as images, making them difficult to print, select, and scale to different size screens. We decided that Excel needed to be an equal citizen in saving HTML. This also made our Office consistency story stronger.
Word’s opportunity was larger, primarily because most people saw what was in the browser resembling Word documents. Just as most of what we read in print was originally created in Word, the early internet was taking on those same characteristics. Leading the efforts on Word’s program management team were KayW, who had early on recognized the power of web authoring for Office, and Eric Levine (EricLev). EricLev started his career at Microsoft in marketing after college where he was coxswain on the Harvard crew team. A giant oar adorned the wall of his office (always a pain when he moved offices). Together, KayW and EricLev drove much of the strategy across Office for HTML. Coincidently, my office was between them, putting me literally in the middle of the HTML strategy.
Kay and Eric were strong proponents of developing what was called round-trip HTML, which meant Word created brand new documents (or read in old documents) and saved them as HTML and later opened them again to make changes. HTML as a first-class format, not only a publish-only format, was as brilliant as it was difficult. As an example, consider something simple like a page header. On a printed page it is obviously a header centered, but how it got there could be the result of many different paths, and to open a file for editing later we needed to preserve that path. A centered heading could be created by hitting the center button obviously, but it could also be centered using tabs, changing the margins, or even putting it inside of a table and then adjusting the table. The text itself could be formatted using the bold button and font size, or it could be adjusted with the Heading style. Looking at an entire printed document and thinking about the permutations for every bit of formatting quickly boggled the mind.
The complexity of simply copying and pasting formatted text from one product to another was already mind-boggling—and an ongoing source of frustration and PSS calls and a competitive problem as well. HTML was an opportunity to improve because all the products worked on supporting the format at the same time. The clipboard, where information is temporarily stored, relied on an age-old format called RTF (rich text format) that many tools supported, but unevenly. If all the tools supported HTML natively, there was a good chance sharing data across tools would improve. That was our plan.
Solving any problem in Office across the main applications was an enormous task because there was so much history. The product, even in the late 1990s, had thousands of features. But HTML and keeping track of all the formatting and how features were saved in files was a next-level effort. KayW and EricLev amassed an incredible amount of knowledge about how HTML was implemented across the products. It worked so well we found many places that browsers were not ready, or, according to the browser teams (including Microsoft’s), we used features in unintended ways. For example, saving a spreadsheet could easily create a table that essentially caused the browser to choke on too many rows or columns. New browser features used by PowerPoint were unevenly implemented across different browsers, making slides show up incorrectly depending on the vendor or version of the browser. Professional web designers were experiencing these same problems in their handcrafted web pages, which they would debug by trial and error in different browsers.
These problems led to the rather heated and ongoing debate between BillG and me. He was itching to tell me, “Told you so.” He saw an opening because he believed we were limiting ourselves to a lame foundation. I saw the foundation as moving rapidly and one we could exploit. He saw the foundation as one that would constrain us and commoditize our product.
Bill was nervous about using HTML because of the loss of proprietary features. Would HTML be slower? Would files be bigger? Would it be easier to clone Office if we constrained it to only features the browser could render? Would fewer people buy Office and rely on a small set of licensed users to do document creation while others skipped buying Office to use a browser? These were not just good questions, but they were all answered in ways that made the strategy appear flawed. My view of the strategy was simply that the browser was happening, and either Office created content for the browser and remained relevant or other tools, not Office, created documents when they needed to be viewed in a browser. This was a ride the horse in the direction it is going strategy.
Bill’s mail to me that Saturday morning generated a response from me. That’s how we always worked together. Writing long detailed responses to even polarizing assertions was kind of my thing. Bill’s thing was poking staccato style with a list of assertions, an argument. My replies were many paragraphs with context, pros and cons, and conclusions. I often included screen shots or supporting information. I did that quickly. If Bill’s approach was to “shock” then my response was to “awe”.
In this case, Bill’s shocking email was more than enough to garner interest by the antitrust regulators and our trial. We weren’t thinking about that on Saturday morning. My reply only showed the tension and, in a sense, amplified the negatives of his comment. This thread was just one of so many like this. The most mundane conversation memorialized in email just never goes well. Mark my words.
Bill’s provocative statement was definitely dramatic, but typical:
[A]llowing Office documents to be rendered very well by other peoples [sic] browsers is one of the most destructive things we could do to the company. […] We have to stop putting any effort into this and make sure that Office documents very well depends [sic] on PROPRIETARY IE. Anything else is suicide for our platform. This is a case where Office has to avoid doing something to destory [sic] Windows.
Things were not that bad. But I somehow managed to make them worse, especially in the eyes of the regulators. Because of the variances in browsers and our desire to use HTML, we were working well with the Internet Explorer (IE) team as we looked for ways to maintain an edge over Netscape in rendering documents, and maybe Netscape was not all that keen to work with us. The potential flood of Office documents seemed like an opportunity. On the other hand, rendering better in IE than in other browsers could also be perceived as a corporate initiative.
One person’s strategy, however, is another’s nefarious plot.
For all practical purposes, Office 2000 requires Windows and IE. We started the project trying to be great on all browsers, and even greater on Internet Explorer (from our vision and presentation we did for you), but the momentum inside the company essentially prevents that message from making it through development.
That was what every observer’s worst nightmare might have been. The momentum inside the company was solidly behind IE, as naturally expected—no one was thinking about helping Netscape. Office didn’t require IE. Rather, IE was the only browser interested in what amounted to standard use of HTML in Office. The trial offered up some of the many times Office and Netscape tried to work together, but it is not difficult to imagine that went nowhere.
BillG would almost always default to a proprietary solution as I had learned in our very first meeting when we discussed C++ and his desire to extend C++ to make it easier to create Windows programs. That point of view was most effective when the playing field was level. The tables had turned, and Microsoft benefitted from the use of open standards. The risk was low as I tried, but not so successfully, to explain.
We finished the release with an incredible implementation. The demonstration included being able to have incredibly rich documents in Word, Excel, PowerPoint (and Outlook with fancy email that could read by any HTML mail client) saved out to a web site running FrontPage. We did have endless debates with HTML purists who absolutely hated how we “abused” HTML. In demonstrations with that crowd, I was routinely asked to show the resulting HTML which was not human readable. This aspect of our implementation was well ahead of its time, as today even the source to the simplest web page is impossible for a human to digest without tools. Everyone abuses HTML.
For much of the project, BillG and I went back and forth in email. History was not entirely kind to either of us in this debate—while we both got elements right, ultimately where HTML and Office ended up was kind of boring, albeit successful. Given that most reading this never directly experienced Office using HTML, I should fast-forward a bit and finish the story.
To this day, Office is as widely used as ever, and it continues to dominate any other document creation tools. Word, Excel, and PowerPoint ended up contributing massive amounts of information to the WWW, as neither .DOC/.XLS/.PPT but rather as PDF, Adobe’s ancient portable document format, which is essentially a printed version of the document.
Nobody expected PDF to dominate. It was some combination of the deployment cycle of new Office apps that supported HTML, lack of awareness that Office could produce HTML, and especially a lack of ways for regular people to share Office as HTML. The biggest thing PDF brought to the solution was a single file, that always looked the same and looked exactly like it would look when printed. The problem we could never solve (and we tried) with HTML was how to deal with the explosion of files such as pictures, charts, illustrations, and so on typically found in Office documents and intrinsic to the browser.
There is some irony in this endpoint. These portable document formats were my first project when I was Bill’s technical assistant, a hand-off from my predecessor. AaronG and I both thought PDF was potentially super useful (the Acrobat product was new when I became TA). Bill disagreed then for the same reasons HTML with Office was considered not the best idea—he wanted to see Office’s native formats. For any number of reasons, PDF never became the liability he thought it could become. My dream of HTML documents created in Office, showing up in browsers, never quite materialized. Neither of us got to an end-state we wanted, but Office remained relevant.
HTML proved to be enormously beneficial as the underpinnings for an entirely modern and newly designed Office file format, Microsoft Open XML, which became an open standard and eventually used by other products (and was regulator friendly). HTML was also instrumental in making copy and paste across Office, and from browsers, vastly more reliable. Word can still publish to HTML, and it is still pretty nifty.
We were both partially right. We were both also quite wrong.
We (a member of the Office marketing team and I) just made up this term on a plane flight to PC Expo in New York and used it for briefings during the conference. After the stories ran, I would hear from a well-known blogger who thought we used a term he coined. I chalked it up to convergent evolution of terminology, but am adding this footnote to avoid reliving that moment.