Some of you may have noticed my recently-published paper on existential risk and artificial intelligence. The paper offers a somewhat critical perspective on the recent trend for AI-doomsaying among people like Elon Musk, Stephen Hawking and Bill Gates. Of course, it doesn’t focus on their opinions; rather, it focuses on the work of the philosopher Nick Bostrom, who has written the most impressive analysis to date of the potential risks posed by superintelligent machines.
I want to try and summarise the main points of that paper in this blog post. This summary comes with the usual caveat that the full version contains more detail and nuance. If you want that detail and nuance, you should read that paper. That said, writing this summary after the paper was published does give me the opportunity to reflect on its details and offer some modifications to the argument in light of feedback/criticisms. If you want to read the full version, it is available at the links in the brackets, though you should note that the first link is pay-walled (official; philpapers; academia).
To give a general overview: the argument I present in the paper is based on an analogy between a superintelligent machine and the God of classical theism. In particular, it is based on an analogy between an argumentative move made by theists in the debate about the existence of God and an argumentative move made by Nick Bostrom in his defence of the AI doomsday scenario. The argumentative move made by the theists is called ‘skeptical theism’; and the argumentative move made by Nick Bostrom is called the ‘treacherous turn’. I claim that just as skeptical theism has some pretty significant epistemic costs for the theist, so too does the treacherous turn have some pretty significant epistemic costs for the AI-doomsayer.
That argument might sound pretty abstract right now. I hope to clarify what it all means over the remainder of this post. I’ll break the discussion down into three main parts. First, I’ll explain what skeptical theism is and why some people think it has significant epistemic costs. Second, I’ll explain Bostrom’s AI-doomsday argument and illustrate the analogy between his defence of that argument and the position of the skeptical theist. And third, I will outline two potential epistemic costs of Bostrom’s treacherous turn, building once more on the analogy with skeptical theism.
(Note: my spelling of ‘skeptical’ has rarely been consistent: sometimes I opt for the US spelling; other times I opt for the British spelling. I can’t explain why. I think it depends on my mood)
1. The Epistemic Costs of Skeptical Theism
I want to start with an interpretive point. Those who have read up on the debate about the technological singularity and the rise of superintelligent machines will know that analogies between the proponents of those concepts and religious believers are pretty common. For instance, there is the popular slogan claiming that the singularity is ‘the rapture for nerds’, and there are serious academics arguing that belief in the rise of superintelligent machines is ‘fideistic’, i.e. faith-based. These analogies are, as best I can tell, intended to be pejorative.
In appealing to a similar analogy there is a risk that my claims will also be viewed as having a pejorative air to them. This is not my intention. In general, I am far more sympathetic to the doomsaying position than other critics. Furthermore, and more importantly, my argument has a pretty narrow focus. I concede a good deal of ground to Bostrom’s argument. My goal really is to try to ‘debug’ the argumentative framework that is being presented; not to tear it down completely.
With that interpretive point clarified, I will move on. I first need to explain the nature the skeptical theist position. To understand this, we need to start with the most common argument against the existence of God: the problem of evil. This argument claims, very roughly, that the existence of evil (particularly the gratuitous suffering of conscious beings) is proof against the existence of God. So-called ‘logical’ versions of the problem of evil claim that the existence of evil is logically incompatible with the existence of God; so-called evidential versions of the problem of evil claim that the existence of evil is good evidence against the existence of God (i.e. lowers the probability of His existence).
Central to many versions of the problem of evil is the concept of ‘gratuitous evil’. This is evil that is not logically necessary for some greater outweighing good. The reason for the focus on this type of evil is straightforward. It is generally conceded that God could allow evil to occur if it were necessary for some outweighing good; but if it is not necessary for some outweighing good then he could not allow it in light of his omnibenevolence. So if we can find one or two instances of gratuitous evil, we would have a pretty good case against the existence of God.
The difficulty is in establishing the one or two instances of gratuitous evil. Atheologians typically go about this by identifying particular cases of horrendous suffering (e.g. Street’s case study of the young girl who was decapitated in a car accident and whose mother held her decapitated head until the emergency services arrived) and making inductive inferences. If it seems like a particular instance of suffering was not logically necessary for some greater outweighing good, then it probably is a case of gratuitous suffering and probably does provide evidence against the existence of God.
This is where skeptical theists enter the fray. They dispute the inductive inference being made by the atheologians. They deny that we have any warrant (probabilistic or otherwise) for going from cases of seeming gratuitous evil to cases of actual gratuitous evil. They base this on an analogy between our abilities and capacities and those of God. We live finite lives; God does not. We have limited cognitive and evaluative faculties; God does not. There is no reason to think that what we know of morality and the supposed necessity or contingency of suffering is representative of the totality of morality and the actual necessity or contingency of suffering. If we come across a decapitated six-year old and her grieving mother then it could be, for all we know, that this is logically necessary for some greater outweighing good. It could be a necessary part of God’s plan.
This position enables skeptical theists to avoid the problem of evil, but according to its critics it does so at a cost. In fact, it does so at several costs, both practical and epistemic. I won’t go into detail on them all here since I have written a lengthy series of posts (and another published paper) about them already. The gist of it is that we rely on inductive inferences all the time, especially when making claims that are relevant to our religious and moral beliefs. If we are going to deny the legitimacy of such inferences based on the cognitive, epistemic and practical disparities between ourselves and God, then we are in for a pretty bumpy ride.
Two examples of this seem apposite here. First, as Erik Wielenberg and Stephen Law have argued, if we accept the skeptical theist’s position, then it seems like we have no good reason to think that God would be telling us the truth in his alleged revealed texts. Thus, it could be that God’s vision for humanity is very different from what is set out in the Bible, because he has ‘beyond our ken’ reasons for lying. Second, if we accept the skeptical position, then it seems like we will have to embrace a pretty radical form of moral uncertainty. If I come across a small child in a forest, tied to a tree, bleeding profusely and crying out in agony, I might think I have a moral duty to intervene and alleviate the suffering, but if skeptical theists are correct, then I have no good reason to believe that: there could be beyond-my-ken reasons for allowing the child to suffer. In light of epistemic costs of this sort, critics believe that we should not embrace skeptical theism.
Okay, that’s enough about skeptical theism. There are three key points to note as we move forward. They are:
A. Appealing to Disparities: Skeptical theists highlight disparities between humans and God. These disparities relate to God’s knowledge of the world and his practical influence over the world.
B. Blocking the Inductive Inference: Skeptical theists use those disparities to block certain types of inductive inference, in particular inferences from the seemingly gratuitous nature of an evil to its actually gratuitous nature.
C. Significant Epistemic Costs: The critics of skeptical theism argue that embracing this position has some pretty devastating epistemic costs.
It is my contention that all three of these features have their analogues in the debate about superintelligent machines and the existential risks they may pose.
2. Superintelligence and the Treacherous Turn
I start by looking at the analogues for the first two points. The disparities one is pretty easy. Typical conceptions of a superintelligent machine suppose that it will have dramatic cognitive and (possibly) practical advantages over us mortal human beings. It is superintelligent after all. It would not be the same as God, who is supposedly maximally intelligent and maximally powerful, but it would be well beyond the human norm. The major difference between the two relates to their supposed benevolence. God is, according to all standard conceptions, a benevolent being; a superintelligent machine would, according to most discussions, not have to be benevolent. It could be malevolent or, more likely, just indifferent to human welfare and well-being. Either way, there would still be significant disparities between humans and the superintelligent machine.
The second analogy — the one relating to blocking inductive inferences — takes a bit more effort to explain. To fully appreciate it, we need to delve into Bostrom’s doomsday argument. He himself summarises the argument in the following way:
[T]he first superintelligence may [have the power] to shape the future of Earth- originating life, could easily have non-anthropomorphic final goals, and would likely have instrumental reasons to pursue open-ended resource acquisition. If we now reflect that human beings consist of useful resources...and that we depend on many more local resources, we can see that the outcome could easily be one in which humanity quickly becomes extinct.
Some elaboration is in order. In this summary, Bostrom presents three key premises in his argument for the AI doomsday scenario. The three premises are:
- (1) The first mover thesis: The first superintelligence, by virtue of being first, could obtain a decisive strategic advantage over all other intelligences. It could form a “singleton” and be in a position to shape the future of all Earth-originating intelligent life.
- (2) The orthogonality thesis: Pretty much any level of intelligence is consistent with pretty much any final goal. Thus, we cannot assume that a superintelligent artificial agent will have any of the benevolent values or goals that we tend to associate with wise and intelligent human beings (shorter version: great intelligence is consistent with goals that pose a grave existential risk).
- (3) The instrumental convergence thesis: A superintelligent AI is likely to converge on certain instrumentally useful sub-goals, that is: sub-goals that make it more likely to achieve a wide range of final goals across a wide-range of environments. These convergent sub-goals include the goal of open-ended resource acquisition (i.e. the acquisition of resources that help it to pursue and secure its final goals).
These premises are then added to the claim that:
- (4) Human beings consist of and rely upon resources for our survival that could be used by the superintelligence to reach its final goals.
To reach the conclusion:
- (5) A superintelligent AI could shape the future in a way that threatens human survival.
I’m glossing over some of the details here but that is the basic idea behind the argument. To give an illustration, I can appeal to the now-classic example of the paperclip maximiser. This is a superintelligent machine with the final goal of maximising the number of paperclips in existence. Such a machine could destroy humanity in an effort to acquire more and more resources for making more and more paperclips. Or so the argument goes.
But, of course, this seems silly to critics of the doomsayers. We wouldn’t be creating superintelligent machines with the goal of maximising the number of paperclips. We would presumably look to create superintelligent machines with goals that are benevolent and consistent with our survival and flourishing. Bistro knows this. The real problem, as he points out, is ensuring that a superintelligent machine will act in a manner that is consistent with our values and preferences. For one thing, we may have troubling specifying the goals of an AI in a way that truly does protect our values and preferences (because they are interminably vague and imprecise). For another, once the AI crosses a certain threshold of intelligence, we will cease to have any real control over its development. It will be much more powerful and intelligent than we are. So we need to ensure that it is benevolent before we reach that point.
So how can we avoid this existential threat? One simple answer is to engineer superintelligent machines in a controlled and locked-down environment (a so-called ‘box’). In this box — which would have to replicate real world problems and dynamics — we could observe and test any intelligent machine for its benevolence. Once the machine has survived a sufficient number of tests, we could ‘release’ it from the locked-down environment, safe in the knowledge that it poses no existential threat.
Not so fast says Bostrom. In appealing to this ‘empirical testing’ model for the construction of intelligent machines, the critics are ignoring how devious and strategic a superintelligent machine could be. In particular, they are ignoring the fact that the machine could take a ‘treacherous turn’. Since this concept is central to the argument I wish to make, it is important that it be defined with some precision:
The Treacherous Turn Problem: An AI can appear to pose no threat to human beings through its initial development and testing, but once in a sufficiently strong position it can take a treacherous turn, i.e. start to optimise the world in ways that pose an existential threat to human beings.
The AI could take a treacherous turn in numerous different ways. These are discussed by Bostrom in his book and mentioned by me in my paper. For instance, it could ‘play dumb’ while in the box, i.e. pretend to be less intelligent or powerful than it really is; or it could ‘play nice’, i.e. pretend to be more benevolent and human-friendly than it really is. It could do these things because, as Bostrom puts it, playing nice or playing dumb could be convergent goals for an AI that wants to get out into the real world and realise its true goals.
Okay, that’s enough on Bostrom’s doomsday argument and the treacherous turn. It is time to take stock and see how this summary of the argument reveals the analogy with the position taken up by skeptical theists. In essence, what has happened is that critics of Bostrom’s argument have appealed to an inductive inference to block the appeal of his argument. They have claimed that repeated empirical testing of an AI which reveals its seeming benevolence would provide us with good evidence of its actual benevolence. And Bostrom has responded to that argument by blocking the inference from what seems to be the case in the ‘boxed’ environment to what is really the case. He has done so by appealing to the long-term strategic planning of the AI, as evinced in the concept of the treacherous turn. This is directly analogous to the move made by skeptical theists.
Thus, we have the first two parts of our analogy. What about the third? Is it the case that accepting the possibility of the treacherous turn has significant epistemic costs?
3. The Epistemic Costs of the Treacherous Turn
I argue that it does. In fact, I argue, somewhat paradoxically, that Bostrom’s reliance on the concept gives rise to both a reductio of his position or an a fortiori. In other words, I think that if you grant the concept of the treacherous turn, you either end with a reason to reject Bostrom’s argumentative framework (and hence a reason to be slightly more optimistic about the construction of superintelligent machines). Or you end up with a reason to be even more worried than Elon Musk.
I’ll explain by describing the general nature of the epistemic costs. One thing we have to bear in mind when considering the epistemic costs of believing in the treacherous turn is that, unlike the epistemic costs of believing in skeptical theism, the scope or impact of those costs will be less widely felt. God, if He exists, is the creator and sustainer of everything. Consequently, everything we see, hear, observe and sense bears his imprint. A superintelligent AI, if one were to exist, would not affect everything in the same way. But that doesn’t mean that the epistemic costs are not significant.
I think we can see this by considering some of the ways in which Bostrom develops the notion of the treacherous turn. In addition to thinking that a superintelligent AI could play smart or play dumb, Bostrom thinks that there is a very narrow window of time in which to detect whether or not the AI is going to take a treacherous turn. He refers to this as the moment of the conception of deception. This is the moment when the AI realises that some deception is necessary if it is to achieve its long-term goals.
[H]aving had this realization, the AI might move swiftly to hide the fact that the realization has occurred, while setting up some covert internal dynamic (perhaps disguised as some innocuous process that blends in with all the other complicated processes taking place in its mind) that will enable it to continue to plan its long-term strategy in privacy.
I believe that this type of thinking means that all our interactions with AI need to be clouded in a pretty deep and possibly paralysing suspicion. We need to be on the look out for this ‘moment of the conception of deception’, but do we have any idea what this might look like? In the paper, I suggested that it is possible that one of our existing AI projects has already crossed through this threshold. Maybe, for example, IBM’s Watson has developed to this point and is currently ‘playing dumb’, biding its time before its human creators give it more practical powers and unleash it on the world. After all, we haven’t been on the lookout for the moment of the conception of deception. It sounds silly, of course, but fanciful speculation of this sort seems to be where this style of thinking leads us.
Now, I have been criticised for taking this line in the published version. Kaj Sotala suggested to me (on Google Plus) that I was being unfair to Bostrom (and others) in pushing such an extreme interpretation of the treacherous turn. He thinks we can be pretty confident that no existing AI project has crossed such a threshold because we know a lot about how such systems work. I am willing to concede this point: I was pushing things too far in the paper. Nevertheless, I still think the epistemic costs are significant. I still think that if we follow Bostrom’s reasoning we should be extremely skeptical of our ability to determine when the threshold to the treacherous turn has been crossed. Why? Because I suspect we have no good idea of what we should be looking out for. Thus, if we are going to be seriously trying to create a superintelligent AI, it would be too easy for us to stumble into the creation of a superintelligent AI that is going to take the treacherous turn without our knowledge.
And what are the broader implications of this? Well, it could be that this all highlights the absurdity of Bostrom’s concerns about the limitations of empirical testing. It does seem like taking the possibility of a treacherous turn seriously commits us to a fairly radical form of Humean inductive skepticism, at least when it comes to our interactions with AIs. This is the reductio argument. Conversely, it may be that Bostrom is right to reason in this manner and hence we have reason to be far more suspicious of any project involving the construction of AI than we currently are. Indeed, we should seriously consider shutting them all down and keep our fingers crossed that no AI has taken the treacherous turn already.
This is why I think believing in the possibility of the treacherous turn has some pretty significant epistemic costs. The analogy with the skeptical theist debate is complete.
I am going to leave it there lest this summary ends up being as long as the original paper. To briefly recap, I think there is an interesting analogy to be drawn between the debate about the existence of God and the debate about the existential risks posed by a superintelligent AI. In the former debate, skeptical theists try to block a certain type of inductive inference in order to save theism from the problem of evil. In the latter debate, Nick Bostrom tries to block a certain type of inductive inference in order to underscore the seriousness of the risk posed by superintelligent machines. In both instances, blocking these inductive inferences can have significant epistemic costs.
Read the full thing for more.