This is the first post in my series on Nick Bostrom’s recent book, Superintelligence: Paths, Dangers, Strategies. In this entry, I take a look at Bostrom’s orthogonality thesis. As we shall see, this thesis is central to his claim that superintelligent AIs could pose profound existential risks to human beings. But what does the thesis mean and how plausible is it?
I actually looked at Bostrom’s defence of the orthogonality thesis before. I based that earlier discussion on an article he wrote a couple of years back. From what I can tell, there is little difference between the arguments presented in the book and the arguments presented in that article. Nevertheless, it will be useful for me to revisit those arguments at the outset of this series. This is, in part, to refresh my own memory and also, in part, to ease myself back into the intellectual debate about superintelligent AIs after having ignored it for some time. Who knows? I may even have something new to say.
I should add that, since the publication of Bostrom’s original defence of the orthogonality thesis, his colleague Stuart Armstrong has produced a longer and more technical defence of it in the journal Analysis and Metaphysics. Unfortunately, I have not read that defence. Thus, I am conscious of the fact that what I deal with below may be the “second-best” defence of the orthogonality thesis. This is something readers should keep in mind.
1. What is the orthogonality thesis and why does it matter?
One thing that proponents of AI risk often warn us against is our tendency to anthropomorphise intelligent machines. Just because we humans think in a particular way, and have certain beliefs and desires, does not mean that an intelligent machine, particularly a superintelligent machine, will do the same. (Except for the fact we will be the ones programming the decision-making routines and motivations of the machine…more on this, and whether it can help to address the problem of AI risk, in future entries). We need to realise that the space of possible minds is vast, and that the minds of every human being that ever lived only occupy a small portion of that space. Superintelligences could take up residence in far more alien, and far more disturbing, regions.
The orthogonality thesis is a stark reminder of this point. We like to think that “intelligent” agents will tend to share a certain set of beliefs and motivations, and that with that intelligence will come wisdom and benevolence. This, after all, is our view of “intelligent” humans. But if we understand intelligence as the ability to engage in sophisticated means-end reasoning, then really there is no guarantee of this. Almost any degree of intelligence, so understood, is compatible with almost any set of goals or motivations. This is the orthogonality thesis. As Bostrom puts it:
Orthogonality Thesis: Intelligence and final goals are orthogonal: more or less any level of intelligence could in principle be combined with more or less any final goal.
We need to unpack this definition in a little more detail.
We’ll start with the concept of “intelligence”. As noted, Bostrom does not mean to invoke any normatively thick or value-laden form of rationality; he simply means to invoke efficiency and skill at means-end reasoning. Philosophers and economists have long debated these definitional issues. Philosophers sometimes think that judgments of intelligence or rationality encompass the assessment of motivations. Thus, for a philosopher a person who greatly desires to count all the blades of grass in the world would be “irrational” or “mentally deficient” in some important respect. Economists generally have a thinner sense of what intelligence or rationality requires. They do not assess motivations. The blade-of-grass-counter is just as rational as anyone else. All that matters is whether they maintain logical hierarchies of motivations and act in accordance with those hierarchies. Bostrom’s view of intelligence is closer to the economists' sense of rationality, except that it also encompasses great skill in getting what you want. This skill must, one presumes, include an ability to acquire true (or reasonably true) beliefs about the structure of the world around you. This is so that you can manipulate that world to reliably get what you want.
The second concept we need to unpack is that of a “final goal”. As far as I can tell, Bostrom never defines this in his book, but the idea is relatively straightforward. It is that an agent can have certain goals which are their raison d’etre, that they fundamentally and necessarily aim at achieving, and others that are merely instrumental to the pursuit of those final goals. In other words, there are certain goals that are such that everything else the agent does is tailored toward the achievement of that goal. For Bostrom, the supposition seems to be that a superintelligent AI could be programmed so that it has a set of final goals are dynamically stable and overwhelming (note: the use of “could be” is significant). This is important because Bostrom appeals to the possibility of overwhelming and dynamically stable final goals when responding to possible criticisms of the orthogonality thesis.
The third thing we need to unpack is the “more or less” qualifier. Bostrom acknowledges that certain goals may not be consistent with certain levels of intelligence. For example, complex goals might require a reasonably complex cognitive architecture. Similarly, there may be dynamical constraints on the kinds of motivations that a highly intelligent system could have. Perhaps the system is programmed with the final goal of making itself stupider. In that case, its final goal is not consistent with a high level of intelligence. These qualifications should not, however, detract from the larger point: that pretty much any level of intelligence is consistent with pretty much any final goal.
So that’s the orthogonality thesis in a nutshell. The thesis is important for the likes of Bostrom because, when understood properly, it heightens our appreciation of AI risk. If a superintelligent machine could have pretty much any final goal, then it could do things that are deeply antithetical to our own interests. That could lead to existential catastrophe. (We'll discuss the argument for this conclusion in a later entry)
2. Is the Orthogonality Thesis Plausible?
At first glance, the orthogonality thesis seems pretty plausible. For example, the idea of a superintelligent machine whose final goal is to maximise the number of paperclips in the world (the so-called paperclip maximiser) seems to be logically consistent. We can imagine — can’t we? — a machine with that goal and with an exceptional ability to utilise the world’s resources in pursuit of that goal. Nevertheless, there is at least one major philosophical objection to it.
We can call it the motivating belief objection. It works something like this:
Motivating Belief Objection: There are certain kinds of true belief about the world that are necessarily motivating, i.e. as soon as an agent believes a particular fact about the world they will be motivated to act in a certain way (and not motivated to act in other ways). If we assume that the number of true beliefs goes up with intelligence, it would then follow that there are certain goals that a superintelligent being must have and certain others that it cannot have.
A particularly powerful version of the motivating belief objection would combine it with a form of moral realism. Moral realism is the view that there are moral facts “out there” in the world waiting to be discovered. A sufficiently intelligent being would presumably acquire more true beliefs about those moral facts. If those facts are among the kind that are motivationally salient — as several moral theorists are inclined to believe — then it would follow that a sufficiently intelligent being would act in a moral way. This could, in turn, undercut claims about a superintelligence posing an existential threat to human beings (though that depends, of course, on what the moral truth really is).
The motivating belief objection is itself vulnerable to many objections. For one thing, it goes against a classic philosophical theory of human motivation: the Humean theory. This comes from the philosopher David Hume, who argued that beliefs are motivationally inert. If the Humean theory is true, the motivating belief objection fails. Of course, the Humean theory may be false and so Bostrom wisely avoids it in his defence of the orthogonality thesis. Instead, he makes three points. First, he claims that orthogonality would still hold if final goals are overwhelming, i.e. if they trump the motivational effect of motivating beliefs. Second, he argues that intelligence (as he defines it) may not entail the acquisition of such motivational beliefs. This is an interesting point. Earlier, I assumed that the better an agent is at means-end reasoning, the more likely it is that its beliefs are going to be true. But maybe this isn’t necessarily the case. After all, what matters for Bostrom’s definition of intelligence is whether the agent is getting what it wants, and it’s possible that an agent doesn’t need true beliefs about the world in order to get what it wants. A useful analogy here might be with Plantinga’s evolutionary argument against naturalism. Evolution by natural selection is a means-end process par excellence: the “end” is survival of the genes, anything that facilitates this is the “means”. Plantinga argues that there is nothing about this process that entails the evolution of cognitive mechanisms that track true beliefs about the world. It could be that certain false beliefs increase the probability of survival. Something similar could be true in the case of a superintelligent machine. The third point Bostrom makes is that a superintelligent machine could be created with no functional analogues of what we call “beliefs” and “desires”. This would also undercut the motivating belief objection.
What do we make of these three responses? They are certainly intriguing. My feeling is that the staunch moral realist will reject the first one. He or she will argue that moral beliefs are most likely to be motivationally overwhelming, so any agent that acquired true moral beliefs would be motivated to act in accordance with them (regardless of their alleged “final goals”). The second response is more interesting. Plantinga’s evolutionary objection to naturalism is, of course, hotly contested. Many argue that there are good reasons to think that evolution would create truth-tracking cognitive architectures. Could something similar be argued in the case of superintelligent AIs? Perhaps. The case seems particularly strong given that humans would be guiding the initial development of AIs and would, presumably, ensure that they were inclined to acquire true beliefs about the world. But remember Bostrom’s point isn’t that superintelligent AIs would never acquire true beliefs. His point is merely that high levels of intelligence may not entail the acquisition of true beliefs in the domains we might like. This is a harder claim to defeat. As for the third response, I have nothing to say. I have a hard time imagining an AI with no functional analogues of a belief or desire (especially since what counts as a functional analogue of those things is pretty fuzzy), but I guess it is possible.
One other point I would make is that — although I may be inclined to believe a certain version of the moral motivating belief objection — I am also perfectly willing to accept that the truth value of that objection is uncertain. There are many decent philosophical objections to motivational internalism and moral realism. Given this uncertainty, and given the potential risks involved with the creation of superintelligent AIs, we should probably proceed for the time being “as if” the orthogonality thesis is true.
That brings us to the end of the discussion of the orthogonality thesis. To recap, the thesis holds that intelligence and final goals are orthogonal to one another: pretty much any level of intelligence is consistent with pretty much any final goal. This gives rise to the possibility of superintelligent machines with final goals that are deeply antithetical to our own. There are some philosophical objections to this thesis, but even if they are true, their truth values are sufficiently uncertain that we should not discount the orthogonality thesis completely. Indeed, given the potential risks at stake, we should probably proceed “as if” it is true.
In the next post, we will look at the instrumental convergence thesis. This follows on from the orthogonality thesis by arguing that even if a superintelligence could have pretty much any final goal, it is still likely to converge on certain instrumentally useful sub-goals. These sub-goals could, in turn, be particularly threatening to human beings.