Three Rules of Solving Problems


The First Rule

If there is only one thing you remember from this essay, let it be this; I have found it to be completely essential in actually solving anything. It seems simple and obvious, but following it can be surprisingly challenging.

If you want to get somewhere, first figure out where you want to go, then figure out how to get there.

The key word in this, the one that isn’t as obvious as it looks, is “then.” There is an order to these steps and it is vitally important to not conflate them.

When you are trying to solve a problem, it’s perpetually tempting to consider where you are now all of the current situations, the tradeoffs you’ve historically made, the things you’ve discovered you’re good at, the things your customers have historically loved  and ask “how can I adjust what I’m doing to make things better?” And this seems like an obviously right thing to do.

The problem with this is that, if you plan this way, you are also implicitly planning around all of the bad things in where you are today. You’re looking at the best small changes from where you are, which may not lead you anywhere close to where you actually want to be. It’s like walking around in a maze: the turn that takes you most toward the exit isn’t necessarily the turn that will actually get you to the exit. Much more likely, it will take you down a chain of successively better approximations that finish at a dead-end.

In engineering projects, I’ve seen this over and over: teams make a series of small changes, not realizing that the system they’ve built is no longer serving even the basic needs of their customers, and those customers who are staying are only doing so because they themselves fear the cost of change — something which they keep doing until one day, they don’t. And then the customers leave, and you have no way to get them back because you haven’t made any investments in getting to where you actually need to be.

The way you avoid this is to clearly distinguish the two steps. First, figure out where you want to be: What are the problems that need solving? Who cares that you solve them? What do they actually want out of a solution? What would a good solution look like? The counterintuitive trick, the thing that feels like you’re cheating or even “doing engineering wrong,” is to do a clean-sheet design, completely ignoring what you have, constraining yourself only by what you are capable of doing without reference to what you have. The output of that is a clear picture of where you want to be. Only once you have that picture clear in your head do you ask how you should move from here to there — which is now a much more concrete question, and often quite different from how you would have moved if you were just tweaking your current solution.

Being more detailed, there are specific steps I usually expect of teams in each of these phases. Whenever considering what to do next, first we do the clean-sheet design phase. In this phase, there are three key questions to ask: What is the problem we are trying to solve?
Who are the people who care that we do (or don’t!) solve it? Do they agree with our framing of the problem?
What would a good (but achievable) solution look like? Do these people agree that having that would actually be a good solution to the problem? Are there things it misses?

You ask them in this order and put particular emphasis on these checks for agreement. Do these steps in writing, not just orally: it is far too easy, in my experience, for two people to talk about what they’re imagining and impose their imagination of what the other person is thinking on the conversation without making it explicit. Showing people the answer in a static form, where there’s no ambiguity about what’s being discussed, is irreplaceable.

(The worst organizational mistakes I have ever seen made boil down to having done this wrong. In the better cases, teams have invested years of work and hundreds of millions of dollars in building a solution that nobody actually wanted; in the worse ones, people did actively destructive things because everyone had a sort of fuzzy agreement and nobody realized what was actually about to be done. If there is one thing you do to improve your process as an organization, it is to make sure that when you are trying to solve a problem, you always start with these three questions, always in writing, and always iterate until all parties agree on what’s being said here.)

Once you have an answer to these three questions, and you know where you want to go, you move on to the planning phase. I’ll write in more depth about this stage another time, but the key idea is to separate this work into milestones.

The defining feature of a good milestone is not intuitive: “launching a product” is not a good milestone. Instead, your goal is that each milestone captures real value. If you were to stop the entire project after achieving any milestone, you would feel that little if any time was wasted — the things you achieved at that milestone, whether they be a better understanding of the problem or a solution to some aspect of it, are valuable in and of themselves.

Why? Because by the time you reach any particular milestone, the situation may have changed, priorities may have shifted, the entire problem may no longer be the thing you want to focus on. There are plenty of good reasons why the project might be wise to abandon. If each milestone marks the successful capture of value, though, this is not a waste.

The Second Rule

This possibility of abandonment brings us to the second rule of design, which again seems simple but actually goes against some very deep instincts.

The systems you build are means towards ends; every one of them is temporary and will one day be the obstructive garbage that you are fighting against. Do not fall in love with your systems; fall in love with the problems you are solving.

It is tempting, especially when we have spent years of effort, to fall in love with the systems we have built, and to feel defensive at any suggestion that they should be replaced. But this is a confusion of past and future: those systems were loved because they made things better because they genuinely improved the world. If they have reached the point where they are ready to be replaced, it does not mean that they failed — far from it, it means that they have succeeded, they have finished their run, and it is time for them to receive an honorable retirement. Their successors are their intellectual children, created out of all of the things we learned from making them and using them; we want their successors to be better than they were, just as we want our own children to someday surpass us.

It’s funny to think about, but one of the happiest days of my career so far has been when a system that I had built a system that served 19 out of 20 of the documents that showed up in Google Search for over a decade was retired. That system had done an amazing job; it was incredibly subtle to build and maintain, forced changes in our understanding of how search worked, and drove all sorts of changes in both software and hardware architecture. And it was being replaced by a new system, built by the successor to the original team, which took everything we had learned from that decade, from that system and others, and built a new system that solved a new generation of problems in an even better way. That new system was damned beautiful. And I have no doubt that this system is even today approaching its own replacement, for the same reason. This day was not a funeral; it was a graduation, a moment of summation, to reflect on these accomplishments and look forward to the future.

Because it had achieved its goal: how would we search an Internet that was a hundred times larger than “classical” search engines could handle? It had solved all the problems of its day, and the new system would solve the next generation of problems. The thing we had dedicated ourselves to was solving the problem of being able to find things and synthesize knowledge; the tool we built was a key step in solving that, but the problem continues.

To be very honest, most problems continue, not just in engineering. Rabbi Tarfon used to say, “it is not yours to finish the work; but neither are you free to set it aside.” The problems most worth solving in life are often infinite in duration, and will not be solved during our lifetimes, nor in the lifetimes of our children. This means that what we can do is build steps towards their solution, solve the aspects of them that surface in our time, and continue towards an ultimate goal that may (inshallah!) be achieved by our descendants.
The Third Rule

Arthur C. Clarke famously said that “any sufficiently advanced technology is indistinguishable from magic.” There are many meanings to what he said, but I would like to draw your attention to its contrapositive:

Any technology distinguishable from magic is insufficiently advanced.

This line, sometimes referred to as Benford’s Law, at first seems like nothing more than a bon mot, a clever bit of wordplay telling us to “do better!” But it has a real meaning, which becomes clear when we ask ourselves what the word “magic” actually means in this context, and what it means to differ from that.

While there are many definitions of the word, the key aspect of “magic” in this sentence is clearly not its supernatural origin — if anything, the whole point of Clarke’s statement was that “magic” can have a decidedly natural origin and still serve the same purposes. Instead, I would ask about what makes magic desirable in the first place: the ability to command the world, so to speak, to make it take the form you wish.

This concept is actually embedded in the famous magic word “abracadabra,” which is not (contrary to popular belief) gibberish; it is Aramaic, meaning “let it come to pass as I have spoken.” The core aspect of magic, from this perspective, is that it can translate one’s inner vision of how the world ought to be directly into physical reality.

This is the sense in which this statement is important. A technology becomes “distinguishable from magic” when one’s inner vision isn’t translated directly into reality but requires laborious steps on the part of the user to convert. My visual imagination is considerably more vivid than my artistic talent, for example; I can imagine many images that I do not, even remotely, have the skill to put on paper. I could probably learn these skills, given a few decades of hard work, and then spend days or months, or years creating a painting, but this is decidedly different from magic.

For a technology to be truly “magical,” it needs to do a few things: It should let you describe what you are imagining in the same language that you conceive it in;
It should let you see the current state of the world in the same language as you use to describe its desired state; and
It should let you manipulate the state of the world in the same language, saying “make it like this.”

The reason this gets a place in my three rules of design, although it is much narrower and more specific than the other two, is that it speaks to how we think about the problems we solve. Going back to the first rule, when we execute its third step — describing the good solution — this rule becomes key to defining “good.” Even more importantly, it makes it much more possible to have a conversation with one's stakeholders to determine whether this actually solves their problem or not, because a system that requires much thinking and abstraction to use may or may not do the exact thing you are imagining, and during these conversations, both parties may not actually understand exactly what the system will be capable of if it does not have this property.

The separation of the three properties of a magical design is also intentional. Describing, seeing, and manipulating are three distinct actions that need to share a language — and in terms of defining milestones, the ability to do any one of them to a problem is already a major advance. In my experience, a system that can simply describe existing reality in an easily comprehensible language is already a massive improvement to people’s lives, and if it then adds the ability to change that reality, so much the better.

It is also very important to hew closely to the text of those properties. “See the current state,” for example, is not the same thing as “see the state that the system has been set to.” This was the lesson of Three Mile Island: the operators of the nuclear reactor had gauges that showed the current setting of the system, which were extremely useful until the day that a valve stuck and the system’s actual state became different from its set state. (For more details on this, see chapter 9 of Mahaffey’s Atomic Accidents, a detailed postmortem of every major nuclear accident since the discovery of the atom. My friend and colleague Lea Kissner have been recommending this book to young engineers for quite some time, and I heartily second it as a way to learn about how engineering can go wrong.)
In conclusion, a story about fencing

These three rules are individually simple, but actually executing them carefully is much harder than it seems. The best explanation of this challenge is one I learned from another friend and colleague, Nelia Mann, who used to be a high-level competitive fencer. She explained to me that every fencing student goes through three stages in their career: First, they are taught strict rules and try hard to follow them;
Then, they reach a stage where they realize that by bending these rules they can get a bunch of wins, and find all the ways to bend them;
Finally, as they approach mastery, they return to following the same rules they had been taught as beginners  only this time, they’re following them correctly.

The difference between the first stage and the third is the deep understanding of why the rules are there, and what it feels like when they are being done correctly or incorrectly. You learn that some rules are actually more like guidelines — things you normally do, but might diverge from in a range of cases, and you can explain why they do or don’t apply at any moment — while others are actually iron-clad rules, things you would never consider breaking in your line of work.

My experience in the years since hearing this story is that these three stages are true not just of fencing, but of every skill. In software engineering, the skill I’ve been learning professionally for the past few decades, you see both this pattern and the non-obviousness of which is which: for example, the “rule” about the partial ordering of mutexes is one which you occasionally do violate, although with extremely copious code comments to explain to future readers what is going on and why, the rules about commenting your code, documenting ownership transfer of resources, or following code style guidelines turn into things you would no more think of violating than you would of typing all of your code backward. The rules that seem to the beginner more basic and less obviously important actually turn out to be the most important to always follow perfectly.

The three rules I’ve posited above, I list because they all fall into this second category: these aren’t rules that you follow approximately, but ones you spend time and effort perfecting your technique at, until they become second nature to you. The framing above is far from the first way I understood them; it itself is an output of spending years attempting to refine and improve my understanding of them, their meaning, and their application. I fully expect that these framings will continue to evolve and improve over time, and (in full accord with the second rule) that in time they may even be replaced by better ones.

I look forward to that improvement, and hope that in the meantime, these rules prove useful to you.

No comments:

Post a Comment