Joel on Software
May 1, 2008
00:01
It was seven years ago today when everybody was getting excited about Microsoft's bombastic announcement of Hailstorm, promising that "Hailstorm makes the technology in your life work together on your behalf and under your control."
What was it, really? The idea that the future operating system was on the net, on Microsoft's cloud, and you would log onto everything with Windows Passport and all your stuff would be up there. It turns out: nobody needed this place for all their stuff. And nobody trusted Microsoft with all their stuff. And Hailstorm went away.
I tried to coin a term for the kind of people who invented Hailstorm: architecture astronauts. "That's one sure tip-off to the fact that you're being assaulted by an Architecture Astronaut: the incredible amount of bombast; the heroic, utopian grandiloquence; the boastfulness; the complete lack of reality. And people buy it! The business press goes wild!"
The hallmark of an architecture astronaut is that they don't solve an actual problem... they solve something that appears to be the template of a lot of problems. Or at least, they try. Since 1988 many prominent architecture astronauts have been convinced that the biggest problem to solve is synchronization.
Follow the story, here. I started picking on one company that appeared to be particularly astronautish: Groove, which was trying to rebuild Lotus Notes (a giant synchronization machine) in a peer-to-peer fashion.
Groove had some early success selling secure networks to the military-industrial complex, but didn't make much of a ripple outside that niche. Their real success was in getting bought by Microsoft, which brought Groove's designer and chief architecture-astronaut Ray Ozzie to the role of "Chief Software Architect" at Microsoft, supposedly the technical guy that would keep inventing the future after BillG left so that Steve Ballmer would have some new territory on which to build his next illegal monopoly.
And now Ray Ozzie's big achievement arrives and what is it? (drumroll...) Microsoft Live Mesh. The future of everything. Microsoft is "moving into the cloud."
What's Microsoft Live Mesh?
Hmm, let's see.
"Imagine all your devices—PCs, and soon Macs and mobile phones—working together to give you anywhere access to the information you care about."
Wait a minute. Something smells fishy here. Isn't that exactly what Hailstorm was supposed to be? I smell an architecture astronaut.
And what is this Windows Live Mesh?
It's a way to synchronize files.
Jeez, we've had that forever. When did the first sync web sites start coming out? 1999? There were a million versions. xdrive, mydrive, idrive, youdrive, wealldrive for ice cream. Nobody cared then and nobody cares now, because synchronizing files is just not a killer application. I'm sorry. It seems like it should be. But it's not.
But Windows Live Mesh is not just a way to synchronize files. That's just the sample app. It's a whole goddamned architecture, with an API and developer tools and in insane diagram showing all the nifty layers of acronyms, and it seems like the chief astronauts at Microsoft literally expect this to be their gigantic platform in the sky which will take over when Windows becomes irrelevant on the desktop. And synchronizing files is supposed to be, like, the equivalent of Microsoft Write on Windows 1.0.
It's Groove, rewritten from scratch, one more time. Ray Ozzie just can't stop rewriting this damn app, again and again and again, and taking 5-7 years each time.
And the fact that customers never asked for this feature and none of the earlier versions really took off as huge platforms doesn't stop him.
How on earth does Microsoft continue to pour massive resources into building the same frigging synchronization platforms again and again? Damn, they just finished building something called Windows Live FolderShare and I haven't exactly noticed a stampede to that. I'll bet you've never even heard of it. The 3,398th web site that lets you upload and download files to a place on the Internet. I'm so excited I might just die.
I shouldn't really care. What Microsoft's shareholders want to waste their money building, instead of earning nice dividends from two or three fabulous monopolies, is no business of mine. I'm not a shareholder. It sort of bothers me, intellectually, that there are these people running around acting like they're building the next great thing who keep serving us the same exact TV dinner that I didn't want on Sunday night, and I didn't want it when you tried to serve it again Monday night, and you crunched it up and mixed in some cheese and I didn't eat that Tuesday night, and here it is Wednesday and you've rebuilt the whole goddamn TV dinner industry from the ground up and you're giving me 1955 salisbury steak that I just DON'T WANT. What is it going to take for you to get the message that customers don't want the things that architecture astronauts just love to build. The people? They love twitter. And flickr and delicious and picasa and tripit and ebay and a million other fun things, which they do want, and this so called synchronization problem is just not an actual problem, it's a fun programming exercise that you're doing because it's just hard enough to be interesting but not so hard that you can't figure it out.
Why I really care is that Microsoft is vacuuming up way too many programmers. Between Microsoft, with their shady recruiters making unethical exploding offers to unsuspecting college students, and Google (you're on my radar) paying untenable salaries to kids with more ultimate frisbee experience than Python, whose main job will be to play foosball in the googleplex and walk around trying to get someone... anyone...to come see the demo code they've just written with their "20% time," doing some kind of, let me guess, cloud-based synchronization... between Microsoft and Google the starting salary for a smart CS grad is inching dangerously close to six figures and these smart kids, the cream of our universities, are working on hopeless and useless architecture astronomy because these companies are like cancers, driven to grow at all cost, even though they can't think of a single useful thing to build for us, but they need another 3000-4000 comp sci grads next week. And dammit foosball doesn't play itself.
Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.
April 22, 2008
21:10
The next podcast is up. Today we talked about why we're doing a podcast in the first place, took some questions/suggestions from listeners, and got into a fight over whether programmers should learn C. Guess which side I took.
There are some improvements, already.
First, there's an RSS feed, so you can subscribe and get each weekly podcast pushed to you. Here's how you would subscribe using Apple iTunes, for example:
- Run iTunes
- Choose Advanced | Subscribe to Podcast
- Paste in this URL: http://blog.stackoverflow.com/index.php?feed=podcast
- There is no step 4.
Now, depending on your settings (under Podcasts in Preferences), iTunes will download the latest podcasts and put them on your iPod when you dock it. You don't have to do anything special. I'm not going to post here every time there's a new podcast; you'll have to subscribe.
A couple of people volunteered to help by typing up transcripts for the hearing-impaired, the pressed-for-time, and search engines. That's a great idea! I opened up a wiki where anyone can contribute to the weekly transcript. If you can spare a few minutes to transcribe even a part of the podcast, that would be greatly appreciated by the many readers for whom an audio podcast is inaccessible.
Jeff has a new blog for the podcast at http://blog.stackoverflow.com/ where the podcasts are posted. You can subscribe to that using a normal RSS reader and see the show notes, links to things we mentioned during the podcast, and there will be comments links for discussion.
If you have any comments, ideas, or suggestions record a short MP3 and email it to podcast@stackoverflow.com. If you don't have the equipment to record an MP3, check out blogtalkradio to find a shockingly easy way to do it with a phone.
I've been working on a way to improve the audio quality. I don't want to make any promises, but next week we'll try to do the show using Skype to get better-than-POTS voice quality.
Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.
April 16, 2008
21:43
What is stackoverflow.com?
Nothing, yet.
But here's the concept:
Programmers seem to have stopped reading books. The market for books on programming topics is miniscule compared to the number of working programmers.
Instead, they happily program away, using trial-and-error. When they can't figure something out, they type a question into Google.
And sometimes, the first result looks like it's going to have the answer to their exact question, and they are excited, until they click on the link, and discover that it's a pay site, and the answer is cloaked or hidden or behind a pay-wall, and you have to buy a membership.
And you won't even get an expert answer. You'll get a bunch of responses typed by other programmers like you. Some of the responses will be wrong, some will be right, some may be out of date, and it's hard to imagine that with the cooperative spirit of the internet this is the best thing we programmers have come up with.
Jeff Atwood and I decided to do something about it. We're starting to build a programming Q&A site that's free. Free to ask questions, free to answer questions, free to read, free to index, built with plain old HTML, no fake rot13 text on the home page, no scammy google-cloaking tactics, no salespeople, no JavaScript windows dropping down in front of the answer asking for $12.95 to go away. You can register if you want to collect karma and win valuable flair that will appear next to your name, but otherwise, it's just free.
When I'm building a new product, my policy has always been to keep quiet about it until I have something to ship. But this isn't really a product. This is a free new community site for programmers around the world and we need your help to design it, to program it, and to build it. We want to hear your suggestions, hear your ideas, and we're going to build it right in front of your eyes. Thus, the vaporware announcement.
Every week, Jeff and I talk by phone (he's in California, I'm in New York), and we're going to record those phone calls and throw them up on the web for you to listen in on, and call it a podcast. We have a lot of trouble keeping on topic, so the podcast may be interesting to you even if you don't want to hear about stackoverflow.com. The first episode is up right now. Eventually I imagine we'll figure out this newfangled "RSS" technology and you'll be able to actually subscribe and get fresh episodes delivered into your ears automatically. All in good time.
Jeff's Announcement
PS I'm still CEO of Fog Creek full time. StackOverflow.com is a joint venture between Fog Creek and Jeff Atwood. He's the full time CEO which means he's calling the shots. I'm sort of a consultant on this one.
Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.
April 14, 2008
11:14
Registration is now open for Business of Software 2008 (the first ever Joel on Software conference). Neil has lined up great speakers:
- Seth Godin, Business Week's "Ultimate Entrepreneur for the Information Age", is the best-selling author of 7 books (including Permission Marketing and Purple Cow) as well as the most popular eBook of all time.
- Eric Sink, founder of SourceGear, author of "Eric Sink on the Business of Software" and the person who coined the term "Micro ISV"
- Steve Johnson of Pragmatic Marketing and winner of last year's Software Idol competition
- Richard Stallman launched the development of the GNU operating system, now used on tens of millions of computers today. Stallman has received the ACM Grace Hopper Award, a MacArthur Foundation fellowship, the Electronic Frontier Foundation's Pioneer award, and the the Takeda Award for Social/Economic Betterment
- Paul Kenny is one of the UK's top sales trainers, consultants and speakers. He has worked with many customers in three continents, including IBM, Perot Systems, The Guardian and tens of others.
- Dharmesh Shah is a geek, serial entrepreneur, founder of HubSpot and blogger at OnStartups.com
- Jessica Livingston is author of Founders at Work: Stories of Startups' Early Days and a founder of Y Combinator
- Jason Fried is founder of 37signals (developers of Basecamp and Ruby on Rails) and Signal vs Noise blogger
- Joel Spolsky, aka, "me," noted DJ, has over 600 karma points on the social news site "Reddit."
BoS2008 is in BOSton, September 3-4. Boston is absolutely beautiful in September. The weather is usually perfect. You can go kayaking on the Charles or take the duck tour if you're unambitious. Over 250,000 college students have just arrived, full of completely unjustifiable hope and optimism. The summer tourist crowd has mostly gone home so you can get into museums and historical sites. There are plenty of coffee shops that aren't NASDAQ-listed.
Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.
April 9, 2008
15:24
“If the flight attendants on the JAL 747 from Tokyo I'm on right now were to, in a remarkable lapse of Japanese standards of service, throw me off the plane with a parachute, I could do a pretty nice roll when I hit the ground.”
From my Inc. Magazine column for April (subscribe here). And with that, I promise to stop telling shaggy dog stories about my days in the army.
Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.
March 17, 2008
09:00
You’re about to see the mother of all flamewars on internet groups where web developers hang out. It’ll make the Battle of Stalingrad look like that time your sister-in-law stormed out of afternoon tea at your grandmother’s and wrapped the Mustang around a tree.
This upcoming battle will be presided over by Dean Hachamovitch, the Microsoft veteran currently running the team that’s going to bring you the next version of Internet Explorer, 8.0. The IE 8 team is in the process of making a decision that lies perfectly, exactly, precisely on the fault line smack in the middle of two different ways of looking at the world. It’s the difference between conservatives and liberals, it’s the difference between “idealists” and “realists,” it’s a huge global jihad dividing members of the same family, engineers against computer scientists, and Lexuses vs. olive trees.
And there’s no solution. But it will be really, really entertaining to watch, because 99% of the participants in the flame wars are not going to understand what they’re talking about. It’s not just entertainment: it’s required reading for every developer who needs to design interoperable systems.
The flame war will revolve around the issue of something called “web standards.” I’ll let Dean introduce the problem:
All browsers have a “Standards” mode, call it “Standards mode,” and use it to offer a browser’s best implementation of web standards. Each version of each browser has its own Standards mode, because each version of each browser improves on its web standards support. There’s Safari 3’s Standards mode, Firefox 2’s Standards mode, IE6’s Standards mode, and IE7’s Standards mode, and they’re all different. We want to make IE8’s Standards mode much, much better than IE7’s Standards mode.
And the whole problem hinges on the little tiny decision of what IE8 should do when it encounters a page that claims to support “standards”, but has probably only been tested against IE7.
What the hell is a standard?
Don’t they have standards in all kinds of engineering endeavors? (Yes.)
Don’t they usually work? (Mmmm…..)
Why are “web standards” so frigging messed up? (It’s not just Microsoft’s fault. It’s your fault too. And Jon Postel’s (1943-1998). I’ll explain that later.)
There is no solution. Each solution is terribly wrong. Eric Bangeman at ars technica writes, “The IE team has to walk a fine line between tight support for W3C standards and making sure sites coded for earlier versions of IE still display correctly.” This is incorrect. It’s not a fine line. It’s a line of negative width. There is no place to walk. They are damned if they do and damned if they don’t.
That’s why I can’t take sides on this issue and I’m not going to. But every working software developer should understand, at least, how standards work, how standards should work, how we got into this mess, so I want to try to explain a little bit about the problem here, and you’ll see that it’s the same reason Microsoft Vista is selling so poorly, and it’s the same issue I wrote about when I referred to the Raymond Chen camp (pragmatists) at Microsoft vs. the MSDN camp (idealists), the MSDN camp having won, and now nobody can figure out where their favorite menu commands went in Microsoft Office 2007, and nobody wants Vista, and it’s all the same debate: whether you are an Idealist (”red”) or a Pragmatist (”blue”).
Let me start at the beginning. Let’s start by thinking about how to get things to work together.
What kinds of things? Anything, really. A pencil and a pencil sharpener. A telephone and a telephone system. An HTML page and a web browser. A Windows GUI application and the Windows operating system. Facebook and a Facebook Application. Stereo headphones and stereos.
At the point of contact between those two items, there are all kinds of things that have to be agreed, or they won’t work together.
I’ll work through a simple example.
Imagine that you went to Mars, where you discovered that the beings who live there don’t have the portable music player. They’re still using boom boxes.
You realize this is a huge business opportunity and start selling portable MP3 players (except on Mars they’re called Qxyzrhjjjjukltks) and compatible headphones. To connect the MP3 player to the headphones, you invent a neat kind of metal jack that looks like this:
Because you control the player and the headphone, you can ensure that your player works with your headphones. This is a ONE TO ONE market. One player, one headphone.
Maybe you write up a spec, hoping that third parties will make different color headphones, since Marslings are very particular about the color of things that they stick in their earlings.
And you forgot, when you wrote the spec, to document that the voltage should be around 1.4 volts. You just forgot. So the first aspiring manufacturer of 100% compatible headphones comes along, his speaker is only expecting 0.014 volts, and when he tests his prototype, it either blows out the headphones, or the eardrums of the listener, whichever comes first. And he makes some adjustments and eventually gets a headphone that works fine and is just a couple of angstroms more fierce than your headphones.
More and more manufacturers show up with compatible headphones, and soon we’re in a ONE TO MANY market.
So far, all is well. We have a de-facto standard for headphone jacks here. The written spec is not complete and not adequate, but anybody who wants to make a compatible headphone just has to plug it into your personal stereo device and test it, and if it works, all is well, they can sell it, and it will work.
Until you decide to make a new version, the Qxyzrhjjjjukltk 2.0.
The Qxyzrhjjjjukltk 2.0 is going to include a telephone (turns out Marslings didn’t figure out cell phones on their own, either) and the headphone is going to have to have a built-in microphone, which requires one more conductor, so you rework the connector into something totally incompatible and kind of ugly, with all kinds of room for expansion:
And the Qxyzrhjjjjukltk 2.0 is a complete and utter failure in the market. Yes, it has a nice telephone thing, but nobody cared about that. They cared about their large collections of headphones. It turns out that when I said Marslings are very particular about the color of things that they stick in their ears, I meant it. Most trendy Marslings at this point have a whole closet full of nice headphones. They all look the same to you (red), but Marslings are very, very finicky about shades of red in a way that you never imagined. The newest high-end apartments on Mars are being marketed with a headphone closet. I kid you not.
So the new jack is not such a success, and you quickly figure out a new scheme:
Notice that you’ve now split the main shaft to provide another conductor for the microphone signal, but the trouble is, your Qxyzrhjjjjukltk 2.1 doesn’t really know whether the headset that’s plugged in has a mic or not, and it needs to know this so it can decide whether to enable phone calls. And so you invent a little protocol… the new device puts a signal on the mic pin, and looks for it on the ground, and if it’s there, it must be a three conductor plug, and therefore they don’t have a mic, so you’ll go into backwards compatibility mode where you only play music. It’s simple, but it’s a protocol negotiation.
It’s not a ONE-MANY market any more. All the stereo devices are made by the same firm, one after the other, so I’m going to call this a SEQUENCE-MANY market:
Here are some SEQUENCE-MANY markets you already know about:
- Facebook | about 20,000 Facebook Apps
- Windows | about 1,000,000 Windows Apps
- Microsoft Word | about 1,000,000,000 Word documents
There are hundreds of other examples. The key thing to remember is that when a new version of the left-hand device comes out, it has to maintain auto-backwards-compatibility with all the old right-hand accessories meant to work with the old device, because those old accessories could not possibly have designed with the new product in mind. The Martian headphones are already made. You can’t go back and change them all. It’s much easier and more sensible to change the newly invented device so that it acts like an old device when confronted with an old headphone.
And because you want to make progress, adding new features and functionality, you also need a new protocol for new devices to use, and the sensible thing to do is to have both devices negotiate a little bit at the beginning to decide whether they both understand the latest protocol.
SEQUENCE-MANY is the world Microsoft grew up in.
But there’s one more twist, the MANY-MANY market.
A few years pass; you’re still selling Qxyzrhjjjjukltks like crazy; but now there are lots of Qxyzrhjjjjukltk clones on the market, like the open source FireQx, and lots of headphones, and you all keep inventing new features that require changes to the headphone jack and it’s driving the headphone makers crazy because they have to test their new designs out against every Qxyzrhjjjjukltk clone which is costly and time consuming and frankly most of them don’t have time and just get it to work on the most popular Qxyzrhjjjjukltk 5.0, and if that works, they’re happy, but of course when you plug the headphones into FireQx 3.0 lo and behold they explode in your hands because of a slight misunderstanding about some obscure thing in the spec which nobody really understands called hasLayout, and everybody understands that when it’s raining the hasLayout property is true and the voltage is supposed to increase to support the windshield-wiper feature, but there seems to be some debate over whether hail and snow are rain for the purpose of hasLayout, because the spec just doesn’t say. FireQx 3.0 treats snow as rain, because you need windshield wipers in the snow, Qxyzrhjjjjukltk 5.0 does not, because the programmer who worked on that feature lives in a warm part of Mars without snow and doesn’t have a driver’s license anyway. Yes, they have driver’s licenses on Mars.
And eventually some tedious bore writes a lengthy article on her blog explaining a trick you can use to make Qxyzrhjjjjukltk 5.0 behave just like FireQx 3.0 through taking advantage of a bug in Qxyzrhjjjjukltk 5.0 in which you trick Qxyzrhjjjjukltk into deciding that it’s raining when it’s snowing by melting a little bit of the snow, and it’s ridiculous, but everyone does it, because they have to solve the hasLayout incompatibility. Then the Qxyzrhjjjjukltk team fixes that bug in 6.0, and you’re screwed again, and you have to go find some new bug to exploit to make your windshield-wiper-equipped headphone work with either device.
NOW. This is the MANY-MANY market. Many players on the left hand side who don’t cooperate, and SCRILLIONS of players on the right hand side. And they’re all making mistakes because To Err Is Human.
And of course this is the situation we find ourselves in with HTML. Dozens of common browsers, literally billions of web pages.
And over the years what happens in a MANY-MANY market is that there is a hue and cry for “standards” so that “all the players” (meaning, the small players) have an equal chance to being able to display all 8 billion web pages correctly, and, even more importantly, so that the designers of those 8 billion pages only have to test against one browser, and use “web standards,” and then they will know that their page will also work in other browsers, without having to test every page against every browser.
See, the idea is, instead of many-many testing, you have many-standard and standard-many testing and you need radically fewer tests. Not to mention that your web pages don’t need any browser-specific code to work around bugs in individual browsers, because in this platonic world there are no bugs.
That’s the ideal.
In practice, with the web, there’s a bit of a problem: no way to test a web page against the standard, because there’s no reference implementation that guarantees that if it works, all the browsers work. This just doesn’t exist.
So you have to “test” in your own head, purely as a thought experiment, against a bunch of standards documents which you probably never read and couldn’t completely understand even if you did.
Those documents are super confusing. The specs are full of statements like “If a sibling block box (that does not float and is not absolutely positioned) follows the run-in box, the run-in box becomes the first inline box of the block box. A run-in cannot run in to a block that already starts with a run-in or that itself is a run-in.” Whenever I read things like that, I wonder how anyone correctly conforms to the spec.
There is no practical way to check if the web page you just coded conforms to the spec. There are validators, but they won’t tell you what the page is supposed to look like, and having a “valid” page where all the text is overlapping and nothing lines up and you can’t see anything is not very useful. What people do is check their pages against one browser, maybe two, until it looks right. And if they’ve made a mistake that just happens to look OK in IE and Firefox, they’re not even going to know about it.
And their pages may break when a future web browser comes out.
If you’ve ever visited the ultra-orthodox Jewish communities of Jerusalem, all of whom agree in complete and utter adherence to every iota of Jewish law, you will discover that despite general agreement on what constitutes kosher food, that you will not find a rabbi from one ultra-orthodox community who is willing to eat at the home of a rabbi from a different ultra-orthodox community. And the web designers are discovering what the Jews of Mea Shearim have known for decades: just because you all agree to follow one book doesn’t ensure compatibility, because the laws are so complex and complicated and convoluted that it’s almost impossible to understand them all well enough to avoid traps and landmines, and you’re safer just asking for the fruit plate.
Standards are a great goal, of course, but before you become a standards fanatic you have to understand that due to the failings of human beings, standards are sometimes misinterpreted, sometimes confusing and even ambiguous.
The precise problem here is that you’re pretending that there’s one standard, but since nobody has a way to test against the standard, it’s not a real standard: it’s a platonic ideal and a set of misinterpretations, and therefore the standard is not serving the desired goal of reducing the test matrix in a MANY-MANY market.
DOCTYPE is a myth.
A mortal web designer who attaches a DOCTYPE tag to their web page saying, “this is standard HTML,” is committing an act of hubris. There is no way they know that. All they are really saying is that the page was meant to be standard HTML. All they really know is that they tested it with IE, Firefox, maybe Opera and Safari, and it seems to work. Or, they copied the DOCTYPE tag out of a book and don’t know what it means.
In the real world where people are imperfect, you can’t have a standard with just a spec–you must have a super-strict reference implementation, and everybody has to test against the reference implementation. Otherwise you get 17 different “standards” and you might as well not have one at all.
And this is where Jon Postel caused a problem, back in 1981, when he coined the robustness principle: “Be conservative in what you do, be liberal in what you accept from others.” What he was trying to say was that the best way to make the protocols work robustly would be if everyone was very, very careful to conform to the specification, but they should be also be extremely forgiving when talking to partners that don’t conform exactly to the specification, as long as you can kind of figure out what they meant.
So, technically, the way to make a paragraph with small text is , but a lot of people wrote which is technically incorrect for reasons most web developers don’t understand, and the web browsers forgave them and made the text small anyway, because that’s obviously what they wanted to happen.
Now there are all these web pages out there with errors, because all the early web browser developers made super-liberal, friendly, accommodating browsers that loved you for who you were and didn’t care if you made a mistake. And so there were lots of mistakes. And Postel’s “robustness” principle didn’t really work. The problem wasn’t noticed for many years. In 2001 Marshall Rose finally wrote:
Counter-intuitively, Postel’s robustness principle (“be conservative in what you send, liberal in what you accept”) often leads to deployment problems. Why? When a new implementation is initially fielded, it is likely that it will encounter only a subset of existing implementations. If those implementations follow the robustness principle, then errors in the new implementation will likely go undetected. The new implementation then sees some, but not widespread deployment. This process repeats for several new implementations. Eventually, the not-quite-correct implementations run into other implementations that are less liberal than the initial set of implementations. The reader should be able to figure out what happens next.
Jon Postel should be honored for his enormous contributions to the invention of the Internet, and there is really no reason to fault him for the infamous robustness principle. 1981 is prehistoric. If you had told Postel that there would be 90 million untrained people, not engineers, creating web sites, and they would be doing all kinds of awful things, and some kind of misguided charity would have caused the early browser makers to accept these errors and display the page anyway, he would have understood that this is the wrong principle, and that, actually, the web standards idealists are right, and the way the web “should have” been built would be to have very, very strict standards and every web browser should be positively obnoxious about pointing them all out to you and web developers that couldn’t figure out how to be “conservative in what they emit” should not be allowed to author pages that appear anywhere until they get their act together.
But, of course, if that had happened, maybe the web would never have taken off like it did, and maybe instead, we’d all be using a gigantic Lotus Notes network operated by AT&T. Shudder.
Shoulda woulda coulda. Who cares. We are where we are. We can’t change the past, only the future. Heck, we can barely even change the future.
And if you’re a pragmatist on the Internet Explorer 8.0 team, you might have these words from Raymond Chen seared into your cortex. He was writing about how Windows XP had to emulate buggy behavior from old versions of Windows:
Look at the scenario from the customer’s standpoint. You bought programs X, Y and Z. You then upgraded to Windows XP. Your computer now crashes randomly, and program Z doesn’t work at all. You’re going to tell your friends, “Don’t upgrade to Windows XP. It crashes randomly, and it’s not compatible with program Z.” Are you going to debug your system to determine that program X is causing the crashes, and that program Z doesn’t work because it is using undocumented window messages? Of course not. You’re going to return the Windows XP box for a refund. (You bought programs X, Y, and Z some months ago. The 30-day return policy no longer applies to them. The only thing you can return is Windows XP.)
And you’re thinking, hmm, let’s update this for today:
Look at the scenario from the customer’s standpoint. You bought programs X, Y and Z. You then upgraded to Windows XPVista. Your computer now crashes randomly, and program Z doesn’t work at all. You’re going to tell your friends, “Don’t upgrade to Windows XPVista. It crashes randomly, and it’s not compatible with program Z.” Are you going to debug your system to determine that program X is causing the crashes, and that program Z doesn’t work because it is using undocumentedinsecure window messages? Of course not. You’re going to return the Windows XPVista box for a refund. (You bought programs X, Y, and Z some months ago. The 30-day return policy no longer applies to them. The only thing you can return is Windows XPVista.)
The victory of the idealists over the pragmatists at Microsoft, which I reported in 2004, directly explains why Vista is getting terrible reviews and selling poorly.
And how does it apply to the IE team?
Look at the scenario from the customer’s standpoint. You visit 100 websites a day. You then upgraded to IE 8. On half of them, the page is messed up, and Google Maps doesn’t work at all.
You’re going to tell your friends, “Don’t upgrade to IE 8. It messes up every page, and Google Maps doesn’t work at all.” Are you going to View Source to determine that website X is using nonstandard HTML, and Google Maps doesn’t work because it is using non-standard JavaScript objects from old versions of IE that were never accepted by the standards committee? Of course not. You’re going to uninstall IE 8. (Those websites are out of your control. Some of them were developed by people who are now dead. The only thing you can do is go back to IE 7).
And so if you’re a developer on the IE 8 team, your first inclination is going to be to do exactly what has always worked in these kinds of SEQUENCE-MANY markets. You’re going to do a little protocol negotiation, and continue to emulate the old behavior for every site that doesn’t explicitly tell you that they expect the new behavior, so that all existing web pages continue to work, and you’re only going to have the nice new behavior for sites that put a little flag on the page saying, “Yo! I grok IE 8! Give me all the new IE 8 Goodness Please!”
And indeed that was the first decision announced by the IE team on January 21st. The web browser would accommodate existing pages silently so that nobody had to change their web site by acting like the old, buggy IE7 that web developers hated.
A pragmatic engineer would have to come to the conclusion that the IE team’s first decision was right. But the young idealist “standards” people went nuclear.
IE needed to provide a web standards experience without requiring a special “Yo! I’m tested with IE 8!” tag, they said. They were sick of special tags. Every frigging web page has to have thirty seven ugly hacks in it to make it work with five or six popular browsers. Enough ugly hacks. 8 billion existing web pages be damned.
And the IE team flip-flopped. Their second decision, and I have to think it’s not final, their second decision was to do the idealistic thing, and treat all sites that claim to be “standards-compliant” as if they have been designed for and tested with IE8.
Almost every web site I visited with IE8 is broken in some way. Websites that use a lot of JavaScript are generally completely dead. A lot of pages simply have visual problems: things in the wrong place, popup menus that pop under, mysterious scrollbars in the middle. Some sites have more subtle problems: they look ok but as you go further you find that critical form won’t submit or leads to a blank page.
These are not web pages with errors. They are usually websites which were carefully constructed to conform to web standards. But IE 6 and IE 7 didn’t really conform to the specs, so these sites have little hacks in them that say, “on Internet Explorer… move this thing 17 pixels to the right to compensate for IE’s bug.”
And IE 8 is IE, but it no longer has the IE 7 bug where it moved that thing 17 pixels left of where it was supposed to be according to web standards. So now code that was written that was completely reasonable no longer works.
IE 8 can’t display most web pages correctly until you give up and press the “ACT LIKE IE7″ button. The idealists don’t care: they want those pages changed.
Some of those pages can’t be changed. They might be burned onto CD-ROMs. Some of them were created by people who are now dead. Most of them created by people who have no frigging idea what’s going on and why their web page, which they paid a designer to create 4 years ago, is now not working properly.
The idealists rejoiced. Hundreds of them descended on the IE blog to actually say nice things about Microsoft for the first times in their lives.
I looked at my watch.
Tick, tick, tick.
Within a matter of seconds, you started to see people on the forums showing up like this one:
I have downloaded IE 8 and with it some bugs. Some of my websites like “HP” are very difficult to read as the whole page is very very small… The speed of my Internet has also been reduced on some occasions. Whe I use Google Maps, there are overlays everywhere, enough so it makes it ackward to use!
Mmhmm. All you smug idealists are laughing at this newbie/idjit. The consumer is not an idiot. She’s your wife. So stop laughing. 98% of the world will install IE8 and say, “It has bugs and I can’t see my sites.” They don’t give a flicking flick about your stupid religious enthusiasm for making web browsers which conform to some mythical, platonic “standard” that is not actually implemented anywhere. They don’t want to hear your stories about messy hacks. They want web browsers that work with actual web sites.
So you see, we have a terrific example here of a gigantic rift between two camps.
The web standards camp seems kind of Trotskyist. You’d think they’re the left wing, but if you happened to make a website that claims to conform to web standards but doesn’t, the idealists turn into Joe Arpaio, America’s Toughest Sheriff. “YOU MADE A MISTAKE AND YOUR WEBSITE SHOULD BREAK. I don’t care if 80% of your websites stop working. I’ll put you all in jail, where you will wear pink pajamas and eat 15 cent sandwiches and work on a chain gang. And I don’t care if the whole county is in jail. The law is the law.”
On the other hand, we have the pragmatic, touchy feely, warm and fuzzy engineering types. “Can’t we just default to IE7 mode? One line of code … Zip! Solved!”
Secretly? Here’s what I think is going to happen. The IE8 team going to tell everyone that IE8 will use web standards by default, and run a nice long beta during which they beg people to test their pages with IE8 and get them to work. And when they get closer to shipping, and only 32% of the web pages in the world render properly, they’ll say, “look guys, we’re really sorry, we really wanted IE8 standards mode to be the default, but we can’t ship a browser that doesn’t work,” and they’ll revert to the pragmatic decision. Or maybe they won’t, because the pragmatists at Microsoft have been out of power for a long time. In which case, IE is going to lose a lot of market share, which would please the idealists to no end, and probably won’t decrease Dean Hachamovitch’s big year-end bonus by one cent.
You see? No right answer.
As usual, the idealists are 100% right in principle and, as usual, the pragmatists are right in practice. The flames will continue for years. This debate precisely splits the world in two. If you have a way to buy stock in Internet flame wars, now would be a good time to do that.
Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.
March 11, 2008
11:33
The summer internships at Fog Creek are already full, but Seth Godin, my marketing rebbe, has marketing internships. Paid! “The idea is to find a diverse group of motivated young people who want to join together to create a few really neat projects. The tools used will range from online video to blogs to copywriting to design. Topics might include politics or Squidoo or book promotion or inventing a new kind of web interaction...no scut work, no cold calling, stapling, collating or errands.”
Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.
March 7, 2008
11:10
The recent release of FogBugz 6.0 has, approximately, doubled our sales, and, while I fully agree that often small teams can accomplish a lot more than large teams, we have a lot of interesting work to be done here and there never seem to be enough people to do it, so we've looked for some areas where adding more people would not necessarily slow things down.
We came up with three fairly interesting new positions that might be a perfect fit for you or someone you know. It's relatively rare for Fog Creek to hire full time engineers; many of the people we've hired in the past have been former summer interns, the company is quite small (20 people), and we don't hire lightly, so this is a rare opportunity to get in the door and take advantage of the fact that Fog Creek was designed from the ground up to be the kind of company where the best software developers want to work ( about Fog Creek).
The first position is in system administration. Normally, I'm pretty happy to hire inexperienced but bright people and let them learn on the job. Even for fairly important jobs, like, say, President of the United States.
But system administration is one of those things where experience is really important. You don't want your new system administrator to learn about how to create secure and robust online services by building something insecure and unrobust and learning from experience. So for our first system administrator, we hired Michael Gorsuch, because he knew how to operate our systems well on day one.
This is a dilemma for smart people who want to learn how to be world class system administrators. If everybody is asking for x years of experience, how are you supposed to get that experience? You can take an entry-level job in a big company as, say, junior DNS administrator, typing changes into DNS configuration files, but you won't learn very much.
Here's where Fog Creek comes in. Michael and I talked about this and decided that our second hire in that department could be totally inexperienced at system administration, as long as they were smart, got things done, and had the personal characteristics to become a great system administrator (attention to detail, insane curiosity, constant need to be learning new things, strong ability to stay levelheaded and organized even in the most chaotic of situations, doesn't soil pants in fear when presented with a command prompt, thinks "rtfm" is a great answer, etc.) This is a once-in-a-lifetime chance to learn the field and gain substantial experience on an interesting, mixed environment including Unix and Windows, desktops and servers, internet hosting and internetworking, open source and Microsoft, with all kinds of interesting moving parts. And you'll be learning from a real master, one on one, in a great environment with zero corporate BS, management that trusts you to order equipment you need without going through some kind of 6 month budget committee process during which a shrill corporate attorney who has been somehow promoted to "head of the capital infrastructure committee" is nervous about using open source hippie software because it seems kind of "communistic," and she had a terrible experience on a commune in the 60s when this really gross guy who never bathed and wore flip flops even in the winter... well, anyway, I'm getting off the subject. At Fog Creek when you need equipment you order it. That's really all there is to it.
Interested? System administrator at Fog Creek
Our next interesting position is for a Chief Linux Guru. This is a hybrid position for somebody who really loves Linux, wants to do a lot of coding, but also wants a more diverse problem-solving kind of job.
Here's the theory behind this position. Our main product, FogBugz, is a server product, available for Windows, Linux, and Mac. On Windows servers, everybody has pretty much the same minimal stuff. So our setup program usually works off the shelf.
On Linux, though, there's a lot more diversity. People have different distros, they have different versions of different important components like MySQL, PHP, and Mono, and they're not all instantly compatible. A lot of Linux administrators went through their server when they first set it up removing things that they didn't think they'd need for "security" reasons ("if you're not going to use /bin/ls, delete it--it's just a security hole waiting to be found", they said), and now, here it is, three years later, and they're installing FogBugz, and they don't get why ls isn't working. Bottom line: it takes a little bit of hammering to get FogBugz to work on many Linux systems.
So this position is for a Linux coder who will also be responsible to get FogBugz working on our customers' systems. My pet theory is that if the person who takes the call when a customer is missing, say, the Pear Mail module, if this person is the same person who maintains the setup code, then they will eventually get sick of sshing into customers' servers and typing "pear install Mail" for them and they'll just fix it in the setup code once and for all. And I think a lot of people would find a job that combines problem solving with new software development is going to be pretty interesting, especially if, as I said, you love Linux.
On the development side, you've also got to handle all the Linux-specific code. Right now, that's a mix of PHP, Mono, and various scripting languages. Most of FogBugz is written in our own portable language, Wasabi. You'll be responsible to maintain the Linux-specific parts of the code, and you'll be working on keeping Wasabi for Linux at the same level as Wasabi for Windows.
Interested? Linux guru at Fog Creek
Finally, we could use an extreme Windows Internals guru. I don't mean an "Access/VB" kind of guru. I mean a Win 32, COM, .Net, GDI programming, low level Windows systems stuff in C++ and C# kind of guru. And when I say ".Net" I don't mean "Ooh look I made an ASP.NET website with a GridView that shows a list of customers." Uh-uh. Leave that bush league stuff to the boss (me). For this job, you'll be working directly on a native .NET programming language, generating CLR bytecode and integrating with the Visual Studio debugger. You'll be resolving obscure threading model problems in Other People's Code. You'll be hacking GDI to improve the performance of our remote desktop service, Copilot. You'll be figuring out why trivial things that used to work don't work any more in 64 bit Vista. This is the perfect job for the kind of developer who has been doing API level Windows programming for years, who has been reading MSDN Magazine since it was called MSJ, who actually understands what Don Box is talking about, who can explain how to instantiate a COM object from a DLL without touching the registry, and who can figure out, from crappy Microsoft documentation, how to play the first four bars of Gaudeamus Igitur on a computer without a sound card.
Interested? Windows internals guru at Fog Creek
Don't small teams get more things done than big teams? Didn't The Mythical Man Month prove conclusively that you should have the smallest team possible? Don't startups with two kids run circles around the big companies? Isn't Fog Creek getting big and bureaucratic? Why hire more people?
No, no, no, and no. It's a little bit more complicated than that. At 20 people we still fit around one lunch table and we're far from not being able to get things done. And what MMM claimed was only that adding people to a late project makes the project later. The more people you have, the more communication you need, which counterbalances the added productivity of the extra people--that's the MMM conclusion--and so when we add people we always try to find a way to do it in a way that's efficient. But the bottom line is that we have a long list of things that we want to do, and our too-small team is forced to do things in serial that we could do in parallel with a couple more people. So in the long run I think we'll continue hiring carefully and discretely, keeping each of the core teams small (our biggest dev team right now is, um, three people), and I think we're still a ways of from worrying about Fog Creek being bogged down in bureaucracy.
Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.
February 28, 2008
11:47
March's Inc. column is online: “Don't tell your star salespeople to take the bus and stay with relatives when they make that call in St. Louis, even though that's what you did when you started the company.”
Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.
February 25, 2008
16:34
From my February column in Inc. Magazine: “The bureaucrats in Washington had forgotten Newton's first law: An object in motion tends to remain in motion.”
Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.
February 23, 2008
February 22, 2008
09:05
Some readers were kind enough to point out in the comments that there are a couple of open source projects that integrate with Remote Desktop. I guess news analysis that appears within hours of a story breaking is never very good. The whole business of remote desktop access is an entire industry, and this is just one tiny fraction of the protocols that were published by Microsoft yesterday, so Charny's story was the moral equivalent of trying to decide who is going to win the next American election by talking to a couple of truckers at a bar. The problem wasn't the accuracy in reporting (although that was sorely lacking), the problem was that the story was too ambitious. When Charny called me I should have said, "The Schleswig-Holstein question is so complicated, only three men in Europe have ever understood it. One was Prince Albert, who is dead. The second was a German professor who became mad. I am the third and I have forgotten all about it."
PS here are the Microsoft Open Protocol Specifications.
Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.
February 21, 2008
21:16
A well-meaning, but rushed, journalist named Ben Charny interviewed me this morning about Microsoft's interoperability press release.
He made a ridiculous number of mistakes. That's the nature of the wire reporters on deadline. Here are some corrections.
- "For want of a few key details from Microsoft Corp. (MSFT), Joel Spolsky tried but failed more than 10 years ago to make a better version of Microsoft's remote office desktop-computer feature."
Not true. Fog Creek Copilot was developed less than three years ago and has been under continous development since then. It has been a profitable product and we're still developing new versions. I believe that what we built is better than Microsoft's Remote Desktop in many ways (it works through firewalls, supports Macintosh and Windows, and is easier to set up for ad-hoc tech support) and it's worse in some ways (it's slower, and uses JPG compression as an optimization which can make the screen blurry). So it's debatable whether I "failed." There was no need for "a few key details from Microsoft" because we don't interoperate with Remote Desktop, we use the open source VNC protocol (incidentally, the client code for Copilot is freely available under the GPL).
- "Now Project Co-Pilot has gotten new life."
OK, it's not "Project Co-Pilot", it's Fog Creek Copilot, and, like I say, we've been working on it continuously for almost three years and are about to release a major new version, so nothing about Microsoft's announcement granted it "new life." That part is just a fabrication.
- "On Thursday Spolsky finally located those elusive lines of code tucked inside 30,000 pages made public Thursday by Microsoft. Before they were available only under trade-secrets licenses."
Huh? Lines of code? Ok, I understand, tech journalists may not understand the difference between "lines of code" and protocol specifications. When the press release came out, I was curious to see if it included a spec for the Remote Desktop protocol. We've always known how the protocol works, and how it transmits GDI commands for performance reasons, which is neither rocket science nor a trade secret. I don't know if the concept is patented, but the X Window server worked this way before Windows was even invented, so if there is a patent for this GDI business, it ain't Microsoft's, but that's neither here nor there.
Copilot doesn't use the remote desktop protocol, full stop, and we don't plan to. I just happened to look for that in the spec (in the 15 minutes between Microsoft's press release and the time the journalist called me) because I was curious to see what kind of stuff was in there, and this was an area I had wondered about.
-
By sharing more technical information about its key products on Thursday, Microsoft has jump-started a wave of development destined to unleash software that will compete with many more of Microsoft's own products. One of the first appears to be remote desktop software, which uses a secured Internet connection to remotely access files and features stored on office- desktop computers. Currently, there's no third-party version of the software, save for Citrix Systems Inc. (CTXS), which has a cross-licensing deal with Microsoft. But Microsoft's existing product is a very basic one, making it rife for improvements. Its relative simplicity is demonstrative of how there's been no competitive offerings that would have forced Microsoft to make a better widget.
This part is what he got mostly right, and it's what I said. As far as I know, there are no competitive clients for Windows Remote Desktop (formerly called Terminal Services) except for Citrix's cross-licensed implementation, presumably because the protocol was never publicized. As a result, if you want to use Windows Remote Desktop, you are stuck with the rudimentary clients Microsoft gives you. There are LOTS of great competitive remote desktop solutions that include both the client and the server; besides our own Copilot, there's Bomgar, LogMeIn, GotoMyPC, and the granddaddy PCAnywhere, and another dozen or so options. But I'm pretty sure none of them interoperate with Remote Desktop because the spec has not been available. Everybody, including Copilot, has their own protocol, usually a variation of the RFB protocol [ PDF spec].
So I thought this was one example of an area where Microsoft actually stood to gain from publicizing their protocols. It's bound to open up lots of opportunities in hundreds of areas (remote desktop is just a tiny example) where third-party developers will be able to develop better drop-in versions of various pieces of the Microsoft software stack, which helps the Microsoft ecosystem more than it detracts from their business. Microsoft isn't making a dime off of RDC because it's free and built into Windows... a few competive options with more features can only help the Windows business in the long run.
(Updated 2/22) A correction to the correction.
Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.
February 19, 2008
04:45
Last week, Microsoft published the binary file formats for Office. These formats appear to be almost completely insane. The Excel 97-2003 file format is a 349 page PDF file. But wait, that’s not all there is to it! This document includes the following interesting comment:
Each Excel workbook is stored in a compound file.
You see, Excel 97-2003 files are OLE compound documents, which are, essentially, file systems inside a single file. These are sufficiently complicated that you have to read another 9 page spec to figure that out. And these “specs” look more like C data structures than what we traditionally think of as a spec. It's a whole hierarchical file system.
If you started reading these documents with the hope of spending a weekend writing some spiffy code that imports Word documents into your blog system, or creates Excel-formatted spreadsheets with your personal finance data, the complexity and length of the spec probably cured you of that desire pretty darn quickly. A normal programmer would conclude that Office’s binary file formats:
- are deliberately obfuscated
- are the product of a demented Borg mind
- were created by insanely bad programmers
- and are impossible to read or create correctly.
You’d be wrong on all four counts. With a little bit of digging, I’ll show you how those file formats got so unbelievably complicated, why it doesn’t reflect bad programming on Microsoft’s part, and what you can do to work around it.
The first thing to understand is that the binary file formats were designed with very different design goals than, say, HTML.
They were designed to be fast on very old computers. For the early versions of Excel for Windows, 1 MB of RAM was a reasonable amount of memory, and an 80386 at 20 MHz had to be able to run Excel comfortably. There are a lot of optimizations in the file formats that are intended to make opening and saving files much faster:
- These are binary formats, so loading a record is usually a matter of just copying (blitting) a range of bytes from disk to memory, where you end up with a C data structure you can use. There’s no lexing or parsing involved in loading a file. Lexing and parsing are orders of magnitude slower than blitting.
- The file format is contorted, where necessary, to make common operations fast. For example, Excel 95 and 97 have something called “Simple Save” which they use sometimes as a faster variation on the OLE compound document format, which just wasn’t fast enough for mainstream use. Word had something called Fast Save. To save a long document quickly, 14 out of 15 times, only the changes are appended to the end of the file, instead of rewriting the whole document from scratch. On the hard drives of the day, this meant saving a long document took one second instead of thirty. (It also meant that deleted data in a document was still in the file. This turned out to be not what people wanted.)
They were designed to use libraries. If you wanted to write a from-scratch binary importer, you’d have to support things like the Windows Metafile Format (for drawing things) and OLE Compound Storage. If you’re running on Windows, there’s library support for these that makes it trivial... using these features was a shortcut for the Microsoft team. But if you’re writing everything on your own from scratch, you have to do all that work yourself.
Office has extensive support for compound documents, for example, you can embed a spreadsheet in a Word document. A perfect Word file format parser would also have to be able to do something intelligent with the embedded spreadsheet.
They were not designed with interoperability in mind. The assumption, and a fairly reasonable one at the time, was that the Word file format only had to be read and written by Word. That means that whenever a programmer on the Word team had to make a decision about how to change the file format, the only thing they cared about was (a) what was fast and (b) what took the fewest lines of code in the Word code base. The idea of things like SGML and HTML—interchangeable, standardized file formats—didn’t really take hold until the Internet made it practical to interchange documents in the first place; this was a decade later than the Office binary formats were first invented. There was always an assumption that you could use importers and exporters to exchange documents. In fact Word does have a format designed for easy interchange, called RTF, which has been there almost since the beginning. It’s still 100% supported.
They have to reflect all the complexity of the applications. Every checkbox, every formatting option, and every feature in Microsoft Office has to be represented in file formats somewhere. That checkbox in Word’s paragraph menu called “Keep With Next” that causes a paragraph to be moved to the next page if necessary so that it’s on the same page as the paragraph after it? That has to be in the file format. And that means if you want to implement a perfect Word clone than can correctly read Word documents, you have to implement that feature. If you’re creating a competitive word processor that has to load Word documents, it may only take you a minute to write the code to load that bit from the file format, but it might take you weeks to change your page layout algorithm to accommodate it. If you don’t, customers will open their Word files in your clone and all the pages will be messed up.
They have to reflect the history of the applications. A lot of the complexities in these file formats reflect features that are old, complicated, unloved, and rarely used. They’re still in the file format for backwards compatibility, and because it doesn’t cost anything for Microsoft to leave the code around. But if you really want to do a thorough and complete job of parsing and writing these file formats, you have to redo all that work that some intern did at Microsoft 15 years ago. The bottom line is that there are thousands of developer years of work that went into the current versions of Word and Excel, and if you really want to clone those applications completely, you’re going to have to do thousands of years of work. A file format is just a concise summary of all the features an application supports.
Just for kicks, let’s look at one tiny example in depth. An Excel worksheet is a bunch of BIFF records of different types. I want to look at the very first BIFF record in the spec. It’s a record called 1904.
The Excel file format specification is remarkably obscure about this. It just says that the 1904 record indicates “if the 1904 date system is used.” Ah. A classic piece of useless specification. If you were a developer working with the Excel file format, and you found this in the file format specification, you might be justified in concluding that Microsoft is hiding something. This piece of information does not give you enough information. You also need some outside knowledge, which I’ll fill you in on now. There are two kinds of Excel worksheets: those where the epoch for dates is 1/1/1900 (with a leap-year bug deliberately created for 1-2-3 compatibility that is too boring to describe here), and those where the epoch for dates is 1/1/1904. Excel supports both because the first version of Excel, for the Mac, just used that operating system’s epoch because that was easy, but Excel for Windows had to be able to import 1-2-3 files, which used 1/1/1900 for the epoch. It’s enough to bring you to tears. At no point in history did a programmer ever not do the right thing, but there you have it.
Both 1900 and 1904 file types are commonly found in the wild, usually depending on whether the file originated on Windows or Mac. Converting from one to another silently can cause data integrity errors, so Excel won’t change the file type for you. To parse Excel files you have to handle both. That’s not just a matter of loading this bit from the file. It means you have to rewrite all of your date display and parsing code to handle both epochs. That would take several days to implement, I think.
Indeed, as you work on your Excel clone, you'll discover all kinds of subtle details about date handling. When does Excel convert numbers to dates? How does the formatting work? Why is 1/31 interpreted as January 31 of this year, while 1/50 is interpreted as January 1st, 1950? All of these subtle bits of behavior cannot be fully documented without writing a document that has the same amount of information as the Excel source code.
And this is only the first of hundreds of BIFF records you have to handle, and one of the simplest. Most of them are complicated enough to reduce a grown programmer to tears.
The only possible conclusion is this. It's very helpful of Microsoft to release the file formats for Microsoft and Office, but it's not really going to make it any easier to import or save to the Office file formats. These are insanely complex and rich applications, and you can’t just implement the most popular 20% and expect 80% of the people to be happy. The binary file specification is, at most, going to save you a few minutes reverse engineering a remarkably complex system.
OK, I promised some workarounds. The good news is that for almost all common applications, trying to read or write the Office binary file formats is the wrong decision. There are two major alternatives you should seriously consider: letting Office do the work, or using file formats that are easier to write.
Let Office do the heavy work for you. Word and Excel have extremely complete object models, available via COM Automation, which allow you to programmatically do anything. In many situations, you are better off reusing the code inside Office rather than trying to reimplement it. Here are a few examples.
- You have a web-based application that’s needs to output existing Word files in PDF format. Here’s how I would implement that: a few lines of Word VBA code loads a file and saves it as a PDF using the built in PDF exporter in Word 2007. You can call this code directly, even from ASP or ASP.NET code running under IIS. It’ll work. The first time you launch Word it’ll take a few seconds. The second time, Word will be kept in memory by the COM subsystem for a few minutes in case you need it again. It’s fast enough for a reasonable web-based application.
- Same as above, but your web hosting environment is Linux. Buy one Windows 2003 server, install a fully licensed copy of Word on it, and build a little web service that does the work. Half a day of work with C# and ASP.NET.
- Same as above, but you need to scale. Throw a load balancer in front of any number of boxes that you built in step 2. No code required.
This kind of approach would work for all kinds of common Office types of applications you might perform on your server. For example:
- Opening an Excel workbook, storing some data in input cells, recalculating, and pulling some results out of output cells
- Using Excel to generate charts in GIF format
- Pulling just about any kind of information out of any kind of Excel worksheet without spending a minute thinking about file formats
- Converting Excel file formats to CSV tabular data (another approach is to use Excel ODBC drivers to suck data out using SQL queries).
- Editing Word documents
- Filling out Word forms
- Converting files between any of the many file formats supported by Office (there are importers for dozens of word processor and spreadsheet formats)
In all of these cases, there are ways to tell the Office objects that they’re not running interactively, so they shouldn’t bother updating the screen and they shouldn’t prompt for user input. By the way, if you go this route, there are a few gotchas, and it's not officially supported by Microsoft, so read their knowledge base article before you get started.
Use a simpler format for writing files. If you merely have to produce Office documents programmatically, there’s almost always a better format than the Office binary formats that you can use which Word and Excel will open happily, without missing a beat.
- If you simply have to produce tabular data for use in Excel, consider CSV.
- If you really need worksheet calculation features that CSV doesn’t support, the WK1 format (Lotus 1-2-3) is a heck of a lot simpler than Excel, and Excel will open it fine.
- If you really, really have to generate native Excel files, find an extremely old version of Excel… Excel 3.0 is a good choice, before all the compound document stuff, and save a minimum file containing only the exact features you want to use. Use this file to see the exact minimum BIFF records that you have to output and just focus on that part of the spec.
- For Word documents, consider writing HTML. Word will open those fine, too.
- If you really want to generate fancy formatted Word documents, your best bet is to create an RTF document. Everything that Word can do can be expressed in RTF, but it’s a text format, not binary, so you can change things in the RTF document and it’ll still work. You can create a nicely formatted document with placeholders in Word, save as RTF, and then using simple text substitution, replace the placeholders on the fly. Now you have an RTF document that every version of Word will open happily.
Anyway, unless you’re literally trying to create a competitor to Office that can read and write all Office files perfectly, in which case, you’ve got thousands of years of work cut out for you, chances are that reading or writing the Office binary formats is the most labor intensive way to solve whatever problem it is that you’re trying to solve.
Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.
February 6, 2008
January 31, 2008
13:45
You know what I really like? TripIt.com. It's amazingly simple. You take all those travel confirmation emails that you get from your travel agent, hotels, car rental agencies, etc, and you just forward them to plans@tripit.com. That's all you have to do. You don't have to sign up for an account. You don't have to log on. You just forward those emails. You can do it right now.
You get a link back by email, with a beautifully organized itinerary, showing all your travel data plus maps, weather reports, and all the confirmation numbers for your flights and address for your hotels and so on.
It's kind of magical. You don't have to fill out lots of little fields with all the details, because they've done a lot of work to parse those confirmation emails correctly... it worked flawlessly for my upcoming trip to Japan.
Think of it this way. Suppose you want to enter a round trip flight on your calendar. The minimum information you need to enter is probably:
- the airline
- the flight number
- four times (departure and arrival, there and back)
- four time zones (or else your phone will tell you that your flight is at 5 pm when it's really at 2pm)
- a confirmation number (for when the airline denies that you exist)
- where you're going
All in all it takes a few minutes and is very error prone. Whereas, with TripIt, you just take that email from the airline or Orbitz, Ctrl+F, type plans@tripit.com, and send. Done.
TripIt is a beautiful example of the Figure It Out school of user interface design. Why should you need to register? TripIt figures out who you are based on your email address. Why should you parse the schedule data? Everyone gets email from the same 4 online travel agencies, 100-odd airlines, 15 hotel chains, 5 car rental chains... it's pretty easy to just write screen scrapers for each of those to parse out the necessary data.
Anyway, it's a shame I have to say this, but I have no connection whatsoever to tripit.com.
Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.
January 29, 2008
21:23
Here's how Microsoft says, “SQL Server 2008 will be late:”
“We want to provide clarification on the roadmap for SQL Server 2008. Over the coming months, customers and partners can look forward to significant product milestones for SQL Server. Microsoft is excited to deliver a feature complete CTP during the Heroes Happen Here launch wave and a release candidate (RC) in Q2 calendar year 2008, with final Release to manufacturing (RTM) of SQL Server 2008 expected in Q3. Our goal is to deliver the highest quality product possible and we simply want to use the time to meet the high bar that you, our customers, expect.”
What? Can you understand that? “A feature complete CTP during the Heroes Happen Here launch wave?” What on earth does that mean?
The guy who wrote this, Francois Ajenstat, ought to be ashamed of himself. Have some guts. Just say it's late. We really don't care that much. SQL Server 2005 is fine. As Judge Judy says, “Don't piss on my leg and tell me it's raining.”
Phil Factor explains.
Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.
14:35
Google: “2D barcodes are an especially exciting part of this because they allow readers to "click" on interesting print ads with their cellphones and seamlessly connect to relevant online content.”
Years ago, I went out on a limb and dismissed a similar scheme thus: “The number of dumb things going on here exceeds my limited ability to grok all at once. I'm a bit overwhelmed with what a feeble business idea this is.”
OK, more than seven years have passed. Things have changed. People have camera phones with web browsers now. Some things are still the same: typing URLs is not hard, this is a monumental chicken and egg problem, and this doesn't provide any value to the consumers who are expected to install new software on their phones to go along with this ridonculous scheme.
Sometimes when the elders say to the youngsters, "don't do that, we tried that, it failed," it's just because they're failing to notice that the world has changed. But sometimes the elders are right, and the youngsters really are too young to know the history of the idea they think that they've just invented.
I guess we'll get to watch to see whether the oldsters or the youngsters will win this one.
Still, it doesn't say much for the quality of those 150 people Google hires every week that they're now chasing some of the worst of the bad ideas of the fin de siecle. What's next, GooglePetFood.com?
Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.
January 25, 2008
15:22
Remember Fog Creek Copilot? The app that our 2005 interns built that lets you remote control someone's computer over the Internet to help them with technical problems?
Well, recently we figured out that we're paying for a lot of bandwidth over the weekends that we don't need, so we decided to make Copilot absolutely free on weekends. Yep, that's right... free as in zero dollars, free, no cost, no credit card, no email address, nothing.
How it works: You go to https://www.copilot.com, enter your name, and get an invitation code. You then download and run a tiny piece of software. Tell your friend the invitation code, they go to copilot.com and enter it, and they download a tiny piece of software. Now you're controlling their computer. Works with Windows or Macintosh, through almost any firewall.
Details: Weekend = 8pm EST (GMT-5) Friday night to 2am EST Monday morning. Copilot subscribers can use Copilot free on weekends, too.
ALSO! The Copilot team is still hard at work; Copilot 3.0 is just starting to enter testing. Tyler and Ben want to hire a Summer intern in marketing for the Copilot team. If you're a smart college student that's more interested in marketing than software development, please apply by emailing your resume to jobs@fogcreek.com.
Not loving your job? Visit the Joel on Software Job Board: Great software jobs, great people.
|
Recent comments
1 day 19 hours ago
3 days 1 hour ago
3 days 2 hours ago
1 week 3 hours ago
1 week 4 days ago
1 week 4 days ago
1 week 5 days ago
1 week 5 days ago
2 weeks 6 days ago
3 weeks 4 days ago