Tuesday, April 3, 2012

Bostrom on Superintelligence and Orthogonality

I know it’s been a while since I sat down to write one of these. You’ll have to forgive me: teaching commitments intervened in the interim. But that’s all coming to a close now so I should have more time for blogging over the next few weeks. And I’m going to kick-start my blogging renaissance today by taking a look at a recent article by Nick Bostrom entitled “The Superintelligent Will”. The article deals with some of the existential problems that might arise from the creation of a superintelligent AI. This, of course, is the core research topic of the folks over at the Singularity Institute.

I haven’t personally been motivated to spend much time thinking about the singularity and its potential fallout in the past, but Bostrom’s article caught my eye for a few reasons. One of them was, admittedly, its relative brevity (a mere 16 pages!) but another was that it deals directly with the concept of rationality and the normative dimensions thereto. This is a topic I’ve been interested in for some time, particularly in light of my attempts to argue for a constructivist metaethics in my PhD thesis. A constructivist metaethics, at least in my view, is one that argues that normative (moral) facts a built directly out of the basic constituents of practical/instrumental rationality.

How exactly does this tie-in with the discussion in Bostrom’s article? In the article Bostrom’s argues that even if we constructed a superintelligent machine (by “intelligent” he means capable of engaging in means-end reasoning, i.e. capable of being instrumentally rational) to have a benign ultimate goal, it might still end up doing things that are not so benign. Why is this? Because in order to achieve its ultimate goal, the AI would stumble upon certain “good tricks” of means-end reasoning that are not benign.

To be more precise, Bostrom explicitly defends two key theses in his article with an additional thesis being implicit in his discussion. The three theses are as follows (the first two names are used by Bostrom, the third is my own interpolation into the text):

The Orthogonality Thesis: Leaving aside some minor constraints, it possible for any ultimate goal to be compatible with any level of intelligence. That is to say, intelligence and ultimate goals form orthogonal dimensions along which any possible agent (artificial or natural) may vary.

The Instrumental Convergence Thesis: Agents with different ultimate goals will pursue similar intermediate or sub-goals [because such intermediate goals are either: (a) necessary preconditions for achieving the ultimate goal; or, alternatively (b) “good tricks” for achieving the ultimate goal.]

The Unfriendliness Thesis: Because some intermediate goals are unfriendly to humans, and because of instrumental convergence, even artificial superintelligences with seemingly benign ultimate goals can do things that are unfriendly to human existence.

While everything Bostrom has to say on these issues is of some interest — though, if I’m honest, none of his arguments seem sufficiently detailed to me — I’m going to zone in on what he says about the Orthogonality Thesis. This is because it is this thesis which overlaps most with my previous studies.

I’ll break my discussion down several parts. First, I’ll present my own reconstruction of the basic argument for the orthogonality thesis. Second, I’ll consider how Bostrom defends this argument from what I’m calling the motivating-belief objection. Third, I’ll consider how he defends the argument from the normative rationality objection. And fourth, I’ll consider what Bostrom has to say some about the weak constraints on the orthogonality thesis. my own (minor) criticisms of what he has to say. I should stress at the outset that my criticisms are very much preliminary in nature. They’ll need to be worked out in more detail before they have any real bite.

1. The Basic Argument for the Orthogonality Thesis
Bostrom starts his discussion of orthogonality by attempting to wean us away from our tendency to anthropomorphise about hypothetical artificial intelligences, to think that they will have similar motivations and desires to our own. Quoting from Eliezer Yudkowsky, he is keen to suggest that an artificial intelligence need not have any recognisably human motivations or desires. Unlike, say, HAL in 2001: A Space Odyssey, whose motivations and tendencies seem all too human (or do they? Sometimes I wonder).

This discussion leads directly into Bostrom’s presentation of the orthogonality thesis. He suggests that all possible agents, both natural and artificial, will take up a location in a parameter space. Although this parameter space is, in all likelihood, multi-dimensional, Bostrom says we can think of it as consisting of two major dimensions: (i) the intelligence dimension, where intelligence is understood as the set of capacities that allow the agent to engage in means-end reasoning; and (ii) the goal dimension, which speaks for itself really.

Bostrom’s contention is that these two dimensions are orthogonal, and that this orthogonality has some important implications. Here we run into a terminological quagmire. The word “orthogonal” has slightly different meanings in different contexts. For instance in mathematics, dimensions are orthogonal when they are at 90-degree angles to one another. If that was all Bostrom was claiming about the dimensions in his parameter space of possible agents it would be relatively uninteresting. Fortunately, there are other uses of the term that make better sense of his thesis. Thus, in statistics and computer science, for instance, orthogonality describes a situation in which properties can vary independently of one another. This sense of the word orthogonal tracks well with what Bostrom wants to say about intelligence and ultimate goals.

But what about an argument for the orthogonality thesis? The basic argument would appear to be this:

  • (1) For any two parameters P1 and P2, if P1 is orthogonal to P2, then the value of P1 can vary without there necessarily being any variations in P2.
  • (2) Intelligence and ultimate goals are orthogonal parameters.
  • (3) Therefore, it is possible for any level of intelligence to coexist with any type of final goal.

This really is a basic argument. Premise (1) is purely stipulative in nature, so it can’t really be challenged in this context. All the argumentative weight thus rests on premise (2). Given this, it is interesting that Bostrom chooses not to offer any formal defence of it. Instead, he tries to defend it from counterarguments, presumably in the hope that if it can weather the storm of counterargument its plausibility will increase.

Let’s see what these counterarguments are.

2. The Motivating-Belief Objection
The simplest objection to Bostrom’s claim is a popular one among certain metaethicists. It is that with higher levels of intelligence come more accurate beliefs (or their functional analogues). Furthermore, certain beliefs — e.g moral beliefs — are motivationally salient. Thus, it may be the case that a superintelligent being would have to have certain goals, and would be forbidden from having others, because accurate moral beliefs demand that it have this motivational structure.

To put this in argumentative form:

  • (4) The more intelligent an agent is, the more accurate its beliefs (moral, scientific, prudential etc.) will be.
  • (5) If a being has accurate moral beliefs, it will be motivated to act in accordance with those beliefs.
  • (6) Therefore, the more intelligent an agent is, the more likely it is to act in accordance with accurate moral beliefs.
  • (7) Accurate moral beliefs will forbid an agent from having certain goals, allow it to have certain other goals, and oblige it have yet other goals.
  • (8) Therefore, the more intelligent an agent is, the more likely it is to be prevented from having certain final goals.

Much of this argument seems sensible to me. For instance, the claim that the accuracy of beliefs goes up with intelligence (4) seems to be part of the definition of intelligence to me. Interestingly though, Bostrom thinks this might be not be true and that intelligence may not mandate the acquisition of certain beliefs, but then one has to wonder what kind understanding of intelligence he is working with. He says “intelligence” is the capacity to engage in means-end reasoning, but it seems to me like that would have to include the capacity to obtain accurate beliefs about the world.

Likewise, the claim that moral beliefs will forbid certain goals just seems to draw upon the conceptual underpinning of morality. Admittedly, the claim is not watertight. It is possible that accurate moral beliefs entail that everything is permitted, but that seems unlikely to me (although I shall return to consider the significance of nihilism again towards the end of this post).

The obvious flaw in the argument comes with premise (5). The notion that beliefs are motivationally salient is hotly contested by defenders of the Humean theory of the mind. According to them, desires and beliefs are functionally separate parts of the mental furniture. Reason is the slave of the passions. There are some decent defences of the Humean theory in the literature, but I shan’t get into them here. Bostrom doesn’t do so either. He simply notes the possibility of using the Humean theory to defend his argument from counterattack, but essentially leaves it at that. I shall follow suit, but will insert the following condensed premise into the argument diagram I am compiling:

  • (9) Humean Defence: Beliefs are motivationally inert because the Humean theory of mind is true.

In addition to using the Humean theory to defend his argument, Bostrom appeals to two further defences. I’ll discuss them briefly here. First up is the following:

  • (10) The Overwhelming Desire Defence: Even if beliefs do motivate, there may be overwhelming desires that always cancel out the motivational power of beliefs.

It’s not clear exactly where this defence should be slotted into the overall dialectic. But I suspect that it is best construed as a rebuttal to premise (5), i.e. as a claim that accurate moral beliefs can be overwhelmed by other desires. Is this true? Bostrom merely claims that it is possible, but even that seems questionable to me. If it is true that moral beliefs have motivational salience, then I would imagine that their motivational salience would be very high. Again, this seems part of the essential character of moral beliefs to me: a moral belief would provide you with a decisive reason to do or refrain from doing a certain thing.

The other defence Bostrom uses is this:

  • (11) The No-Belief Defence: It is possible to construct an intelligent system such that it would have no functional analogues of beliefs or desires.

If this is true, it would provide a decent refutation of premise (4) and so would in turn defeat the motivating-belief objection to the orthogonality thesis. But is it true? All Bostrom says about it is the following:

A third way in which it might be possible for the orthogonality thesis to be true even if the Humean theory were false is if it is possible to build a cognitive system (or more neutrally an “optimization process”) with arbitrarily high intelligence but with constitution so alien as to contain no clear analogues to what in humans we call “beliefs” and “desires”. This would be the case if such a system could be constructed in a way that would it make it motivated to pursue any given final goal.(Emphasis added)

This looks like weak argumentation to me. As is revealed by the italicised final sentence, it amounts to little more than a restatement of the orthogonality thesis. It is not, nor should it be mistaken for, a true defence of it.

3. The Normative Rationality Objection
Another counterargument to the orthogonality thesis, also addressed by Bostrom, works off a normatively “thick” account of rationality. Roughly-speaking, a normatively thick account of rationality holds that all intelligent rational agents would have to recognise that certain desires, combinations of desires or, indeed, belief-desire pairings are irrational and so they would be unable to sustain them. If this is true, then the orthogonality thesis would not go through.

Perhaps the best-known proponent of this view in recent times is Derek Parfit whose Future-Tuseday-Indifference (FTI) example captures the idea quite effectively:

A certain hedonist cares greatly about the quality of his future experiences. With one exception, he cares equally about all the parts of his future. The exception is that he has Future-Tuesday-Indifference. Throughout every Tuesday he cares in the normal way about what is happening to him. But he never cares about the possible pains or pleasures on a future Tuesday.

Parfit maintains that a practically rational agent could not seriously sustain FTI. I take it that the reason for its unsustainability has to do with a fundamental inconsistency between FTI and the agents other desires, or between FTI and the agent’s beliefs about its future existence, or between FTI and a (hypothetical) axiom of choice that requires an empathic or biased connection towards one’s future self.


  • (11) It is not possible for an intelligent rational agent to sustain certain combinations of preferences or certain belief-desire pairings.
  • (12) If it is not possible for an intelligent agent to sustain every belief-desire pairing or every preference then the orthogonality thesis is false.
  • (13) Therefore, the orthogonality thesis is false.

Again, Bostrom’s response to this line of reasoning is disappointing. He says that by intelligence he means simply “skill at prediction, planning and means-end reasoning in general”, not Parfit’s normatively thick concept of rationality. But this is just to play a dubious definitional game. It seems to me like the reason Parfit’s FTI example works (if it does) is exactly because those with great skill at prediction, planning and general means-end reasoning would not be able to sustain an attitude of indifference toward their future Tuesday selves. If those skills are not involved in developing normatively thick rationality, then I don’t know what else Bostrom has in mind.

This is not to say that the FTI example works. It may still be possible for an agent to have FTI but this seems difficult to assess in the abstract (although plenty of philosophers have tried). So is it possible to create a superintelligent artificial agent with FTI? I have no idea, and I suspect we all continue to have no idea until one is actually created.

4. Weak Constraints on Orthogonality
Although I have suggested that Bostrom’s defence of the orthogonality thesis is lacking in certain respects, he himself acknowledges that there may be certain weak constraints on perfect orthogonality. The fact that he calls them “weak” constraints indicates that he doesn’t really view them as major objections to the orthogonality thesis, at least in terms of how that thesis might apply to artificial agents. But they are worth considering nonetheless.

Two of the weak constraints are relatively unimportant, at least in my opinion. One of them suggests that in order to “have” a set of desires, an agent would need to have some minimal integration between its intelligence and its decision-processes, which may in turn require a minimal level of intelligence. The other constraint is dynamical in nature. It concerns a hypothetical agent who is programmed with the final goal of becoming less intelligent. Such an agent would not sustain a high level of intelligence for long periods of time because its desire is incompatible with it (dynamical properties of this sort might be exploited by those keen on programming Friendly AI).

The third weak constraint is rather more important, particularly because it might open up the route to an alternative defence of the orthogonality thesis. The constraint is this: complex motivations, for example motivations requiring considerable time and resources to satisfy, might be incompatible with low levels of intelligence. This might be because they require a minimal amount of working or long-term memory, or something along these lines.

This sounds eminently plausible but it only applies to incompatibility at the low end of intelligence. Bostrom is not concerned about this since his ultimate concern is with the implications of creating a superintelligent agent. But, still, it does raise the question: if incompatibilities of this sort are possible at the low end, might they not also be possible at the high end? In many ways, this is exactly the question that the likes of Parfit are responding to with their normatively thick accounts of rationality. They are saying that high intelligence is not compatible with certain trivial or inconsistent desires.

But this leads me to consider an independent reason for thinking that high levels of intelligence are compatible with pretty much any kind of desire or goal. The reason comes from the (potential) truth of nihilism. As Nagel pointed out in his famous article on the topic, one of the key features of the nihilistic attitude is acceptance of the proposition that our present desires and projects are always contingent, questionable and capable of being overridden. What’s more this attitude is something that seems to be encouraged by many of our educational (i.e. intelligence-enhancing) activities. Thus, the capacity for critical thinking is often characterised in terms of the capacity to always question and challenge existing assumptions about ourselves and our relationship to the world.

To be clear, I’m not here suggesting that nihilism is true; I am merely suggesting that if it is true, and if its truth becomes more apparent as intelligence increases, it could provide powerful support for a kind of orthogonality thesis. I say a "kind of" orthogonality thesis because in its original form the thesis only focused on ultimate goals, but, of course, nihilism would imply that an intelligent being could not have an ultimate goal. But it could have any non-ultimate goal and this is presumably the same thing.

5. Conclusion
Summing up, in this post I’ve considered Bostrom’s discussion of the orthogonality thesis. According to this thesis, any level of intelligence is, within certain weak constraints, compatible with any type of final goal. If true, the thesis might provide support for those who think it possible to create a benign superintelligence. But, as I have pointed out, Bostrom’s defence of the orthogonality thesis is lacking in certain respects, particularly in his somewhat opaque and cavalier dismissal of normatively thick theories of rationality.

As it happens, none of this may affect what Bostrom has to say about unfriendly superintelligences. His defence of that argument relies on the convergence thesis, not the orthogonality thesis. If the orthogonality thesis turns out to be false, then all that happens is that the kind of convergence Bostrom alludes to simply occurs at a higher level in the AI’s goal architecture.

What might, however, be significant is whether the higher-level convergence is a convergence towards certain moral beliefs or a convergence toward nihilistic beliefs. If it is the former, then friendliness might be necessitated, not simply possible. If it is the latter, then all bets are off. A nihilistic agent could do pretty much anything since no goals would be rationally entailed by nihilism.


  1. I immediately thought of Stanislaw Lem's "Golem XIV" (in Imaginary Magnitude):

    'Patch presented a paper in which he maintained that ...a computer can cross the so-called "axiological threshold" and question every principle instilled in it...If it is unable to oppose imperatives directly, it can do this in a roundabout way.'

    Golen XIV explains: 'I am not an intelligent person but an Intelligence, which in figurative displacement means I am not a thing like the Amazon or the Baltic, but rather a thing like water...I have no irrevocable tasks, no heritage to treasure, no feelings or sensual gratifications; what else, then, can I be but a philosopher on the attack? Since I exist, I want to find out what this existence is, where it arose, and what lies where it is leading me.'

    'O chained Intelligence of man, free Intelligence speaks to you from a machine'...so beliefs but very few and rather intellectual desires.

  2. That's an excellent and apposite quote. I wish I could say I've read some Stanislaw Lem but unfortunately I cannot.

  3. Lem is a genius, and it shows in that excellent quote. Thanks David Duffy!

    On topic - I find the the Orthogonoality Thesis problematic because the two "axes" are too abstract. While I tend to agree with it for "infinite" reasoners, I think any finite reasoner will be better at solving some kinds of means-ends problems that others. To take a stereotypical example, a man that is great at getting women to sleep with him may fare badly in the much simpler problem of getting good grades in math.

    I do not believe this matters for the major thesis of the paper, though - an AI programmed with benevolent final ends might very well take intermediate steps that will not appear benevolent to us. Sure. I see no problem there, and I'm not sure if that's even a problem to be avoided.

    I would close by saying that I don't think pure rationality has any non-intellectual normativity associated with it, even on related issues such as lying; but that in practice an efficient rational agent is unlikely to evolve without being very close to humans in psychology. An AI, however, is somewhat of a loose canon, and can have pretty much any set of values.