I want to begin with the problem.
A couple of weeks back — the 24th of December to be precise — I published a post offering a general framework for understanding and participating in (philosophical) debates about the technological singularity. The rationale behind the post was simple: I wanted to organise the dialectical terrain surrounding the technological singularity into a number of core theses, each of which could be supported or rejected by a number of arguments. Analysing, developing and defending the premises of those arguments could then be viewed as the primary role of philosophical research about this topic. Obviously, I was aware that people are engaged in this kind of research anyway, but I had hoped that the framework might provide some utility for those who are new to the area, while also enabling existing researchers to see how their work connects with that of others.
The framework I came up with centred on three main theses: (i) the explosion thesis, which was about the likelihood of an intelligence explosion of AI; (ii) the unfriendliness thesis, which was about the likelihood that any superintelligent AI would be antithetical to human values and interests; and (iii) the inevitability thesis, which was about the avoidability (or evitability) of the creation of superintelligent, unfriendly AI. In the earlier post, I suggested that each of these theses represented a point on a spectrum of possible theses, with the other points along the spectrum being made up of weaker or contrary theses.
The post attracted a number of comments, some of which were not directly related to the substance of the post, and some of which expanded (in very useful ways) on the various ideas contained therein. One comment, however, from Eliezer Yudkowsky was somewhat critical of the framework I had proffered. Upon reflection, I think that a significant portion of this criticism was on target. The main problems with the framework as I had formulated it was: (a) it conflated notions of possibility and probability when it came to the discussion of AI development; and (b) it made the inevitability thesis seem separate from the other two when in reality it was a sub-part of both. That led to an overly-complicated framework.
You can see the problem more clearly by looking at the three theses as I originally defined them (note: the explosion thesis appears in abbreviated form here):
The Explosion Thesis: There will be an intelligence explosion, i.e. there will be AI++.
The Unfriendliness Thesis: It is highly likely that any AI+ or AI++ will be unfriendly to us. That is: will have goals that are antithetical to those of us human beings, or will act in a manner that is antithetical to our goals.
The Inevitability Thesis: The creation of an unfriendly AI+ or AI++ is inevitable. (A number of related theses were developed which suggested that it might be strongly or weakly inevitable, or maybe even “evitable” or avoidable).
Here, both the Explosion Thesis and the Unfriendliness are phrased in terms of what is going to happen or in terms of what is likely to happen. And yet, despite this, the inevitability thesis (and the associated evitability theses) is phrased in terms of what can or cannot be avoided. But isn’t that simply to say that certain outcomes either likely or not likely to occur? And thus isn’t that already included in the definition of the other two theses? A framework which conflates and blends theses in this manner is likely to do more harm than good; to confuse rather than enlighten.
That is the problem.
1. Correcting the Problem: Thoughts and Methodology
The obvious solution to the problem is to provide a better overview and framework, and that’s what I want to try to do in the remainder of this post. But before I do so, I want to say a few things about the methodology I’m adopting and the limitations of my approach.
The methodology remains committed to same goals as last time round. First, I want to organise the dialectical terrain surrounding the technological singularity into a set of theses, each of which can be supported by one or more arguments, the analysis, evaluation and evaluation of which is the role of philosophical research in this area. Second, I want to ensure that the organisation is simple without being simplistic; that it succinctly covers the key topics in such a way that the debates can be isolated in a short space of time, while also retaining some reductive complexity for those who want to think about it in more detail. Finally, I want the framework to work in such a way that those who accept particular combinations of theses will end up with distinct and recognisable positions on the technological singularity.
Those three things — identification of core theses, simplicity with reductive complexity, and the ability to locate distinct positions or views — form my self-imposed conditions of success for the organisational framework. But obviously, any such organisational exercise starts with certain biases on the part of the organiser. To be more precise, the exercise starts with the organiser having a fuzzy sense of what ought to be included within the framework, which is hopefully rendered less fuzzy, and less arbitrary in the process of formulating and refining the framework. In this instance, my own fuzzy sense of what ought to be included within the framework is threefold.
First, I think some characterisation of the technological singularity is necessary. In other words, at least one of the theses has to say something about the possibility of AI creating AI+ and AI++. That, after all, is what the “singularity” is supposed to be. This is unsurprising given that the whole terrain is already grouped under that particular heading and this is what the Explosion Thesis was supposed to capture last time out. Second, at least one of the theses should cover the risks and rewards associated with the creation of AI+. Again, this is unsurprising given that concerns about the safety of AI technologies, along with hopes as to their beneficial potential, is what provides most of the motivation for researchers in this area. This is what the Unfriendliness Thesis was supposed to capture. And third, some characterisation of the psychological, social, and political factors influencing the development of AI++ — what I will call the “strategic infrastructure of AI development” — is necessary. These factors are distinct from those relating to the pure technological possibility of AI+, but assuming possibility they will obviously play an important role in its future creation. The Inevitability Thesis was supposed to capture this aspect of the debate, but I now think it failed to do so appropriately.
So this suggests the shape that the new framework ought to take. In essence, all three of these topics ought still to be covered, but they should not necessarily be represented as distinct theses. At least, not as distinct theses in the way they were before. Instead, the framework will be reduced to two core theses — the Explosion Thesis, and the Unfriendliness Thesis — each of which comes in two versions: (i) a technical version, which relates to conceptual and technological possibility; and (ii) a strategic version, which relates to all-things-considered likelihood, given either the current or plausible forms that the strategic infrastructure could take. Arising from these two theses, will be two decision trees, the endpoints of which will represent different positions one can take on the technological singularity. The decision tree format flows naturally out of the two versions of theses, since the strategic version depends on the technical version. That is to say: technologically possibility must be addressed before we consider the strategic implications.
The end result is a framework that (I think) corrects for the primary defects in the previous version, while still satisfying the three conditions of success mentioned above. Nevertheless, the framework still has its limitations. In addition to flaws I have not spotted myself, it will not be to everyone’s tastes since it works from my own intuitive biases about what is important in this area. It also ignores some interesting topics and runs the risk of constraining speculation in this area. Consequently, despite the fact that I’m dedicating two blog posts to the exercise, I don’t want to treat it too seriously. If this version of the framework is flawed — as it no doubt will turn out to be — then so be it. I won’t waste time perfecting it since it would probably be more worthwhile to evaluate and improve upon what has been written in this area.
Anyway, onto the revised framework itself. Remember: two theses, each coming in two flavours, which can then be co-opted into a pair of decision trees, the endpoints of which represent recognisable positions one can take on the technological singularity.
2. The Explosion Thesis
We start with Explosion Thesis, which in its technical form looks like this:
The Explosion ThesisTV: It is possible (conceptually & technically) for there to be an intelligence explosion. That is: a state of affairs could arise in which for every AIn that is created, AIn will create a more intelligent AIn+1 (where the intelligence gap between AIn+1 and AIn increases) up to some limit of intelligence or resources.
This is roughly the same as the version of the Explosion Thesis presented in the previous post but there are two major changes. First, in line with what I said above, this version now focuses on what is technologically possible, not on what is likely to happen. Second, I’ve added the important caveat here that in order for there to be a true intelligence explosion, the intelligence gap between AIn and AIn+1 must be increasing (at least initially). If the gap were merely incremental, there wouldn’t really be an explosion (h/t Yudkowsky’s comment on my previous post). The actual speed of the “takeoff” is left open. Following Chalmers, I’ll use the abbreviation AI+ to refer to any AI with greater-than-human intelligence, and the abbreviation AI++ to refer to the kind of AI one might expect to arise following an intelligence explosion.
A couple of other points are worth mentioning here. First, the notion of “intelligence” is somewhat fuzzy, and likely to be contested by researchers. In order to have a fully-functional version of the explosion thesis, one would need to pin down the concept of intelligence with more precision. However, in line with my goal of formulating an overarching framework for philosophical research on the singularity, I’m leaving things a bit fuzzy and general for now. Second, the Explosion Thesis continues to represent a point along a spectrum of theses about the possibility of AI, AI+ or AI++. The lower end of the spectrum is represented by complete skepticism about the possibility of AI, with the middle of the spectrum being occupied by the AI thesis (i.e. AI is possible but nothing more) and the AI+ thesis (i.e. AI+ is possible, but AI++ is not). My own feeling is that the risk-related concerns are somewhat similar for AI+ and AI++ (though, obviously, more pronounced for the latter), hence I tend to lump those two theses together. In other words, I think that if you accept the possibility of AI+, you should ask yourself the same kinds of questions you would if you accepted AI++.
I won’t talk too much about the arguments for and against the Explosion Thesis in this post. I mentioned a few the last day, and they remain roughly the same. The classic argument is that of I.J. Good, and Chalmer’s lengthy philosophical analysis of the singularity presents a couple of arguments too. I’m going to look at those in a later post.
Moving on then to the strategic version of the Explosion Thesis, which looks like this:
The Explosion ThesisSV: It is highly likely (assuming possibility) that there will be AI++ (or, at least, AI+).
In essence, this version of thesis is claiming that the strategic infrastructure surrounding AI development is such that, if AI++ is possible, it is likely that it will, at some stage, be created. People will be motivated to create it, and the necessary resources will be available for doing so. Although this thesis is phrased in a parsimonious manner, hopefully it can be seen how it covers issues surrounding political, social and psychological incentives, not issues of pure technological possibility. Note, as well, how in this version of the thesis I have explicitly lumped the possibility of AI++ and AI+ together. This is in keeping with what I just said about the risks of both.
To defend the strategic version of the thesis, one would need to show the absence of motivational and situational defeaters for the development of AI+ or AI++. The terminology is borrowed from Chalmers, who defined the former in terms of the psychological and social incentives, and the latter in terms of resource limitations and the potential for self-destruction associated with the creation of AI++. I’ll talk about those some other time.
How does one construct a decision tree out of these two theses? Very simply, one takes the first node of the tree to represent a choice between accepting or not accepting the technical version of the Explosion Thesis, with subsequent nodes representing choices between weaker views about the possibility of AI, and, of course, the strategic version of the Explosion Thesis. As follows:
The Unfriendliness Thesis
The technical version of the Unfriendliness Thesis can be defined in the following manner:
The Unfriendliness ThesisTV: It is possible (conceptually and technically) for an AI++ or AI+ to have values and goals that are antithetical to our own, or to act in manner that is antithetical to those values and goals.
First things first, as I noted the last time, “unfriendliness” is probably not the best term to use here. If we were being more philosophically precise, we might define this thesis in terms of objective and subjective values (or, more simply, “good” and “bad”) since the effect of AI+ or AI++ on those values is what really matters, not their “friendliness” in the colloquial sense. Furthermore, the “us” in the definition is problematic. Who are “we” and why do our values matter so much? By using “us”, the thesis runs the risk of adopting an overly species-relative conception of value. Nevertheless, part of me likes using the label “unfriendliness”. One reason is that it maps onto the pre-existing terms of Friendly and Unfriendly AI. Another reason is that it is more catchy than alternatives like the “Detrimental to Objective Value Thesis”. So I’m going to stick with it for now.
But why is this thesis interesting? Surely what matters when it comes to AI risks is their likelihood not their possibility, particularly given how philosophers speculate about all kinds of bizarre possibilities (Twin Earths and the like). This is where the qualifier of “technical” possibility becomes important. The focus of the technical version of the Unfriendliness Thesis is not on bizarre and far-fetched possibilities, but on what is possible given what we currently know about AI architectures and the relationship between intelligence and morality. Furthermore, it is important to consider these possibilities up front because, as has been acknowledged by Bostrom and Armstrong, philosophers might be inclined to head-off discussion of AI risk by arguing that intelligence and morality are positively correlated: an increase in one leads to an increase in the other. Now, I’m not suggesting that these philosophers would be right or wrong to make those conceptual and technical arguments, quite the contrary, I’m just suggesting that those arguments should be considered independently.
That leads nicely to the strategic version of the Unfriendliness Thesis, which reads as follows:
The Unfriendliness ThesisSV: It is highly likely (assuming possibility) that any AI+ or AI++ that is created will have values and goals that are antithetical to our own, or will act in a manner that is antithetical to those values and goals.
In other words, given existing incentive structures and methodologies within AI research and development, and given the conceptual possibilities, any AI+ or AI++ that is created is likely to be unfriendly. There are lots of arguments one could offer in favour of this thesis, some of which were mentioned the last time (e.g. the vastness of mindspace argument, or the complexity of human values argument). To those could be added the argument that Yudkowsky mentioned in his comment to my previous post: the fragility of human values argument (though I’m not sure how different that is from the others). Opposing arguments have been presented by the likes of Alexander Kruel, Richard Loosemore and Ben Goertzel. Both Alexander and Richard commented on my earlier post. They tend to feel (if I may paraphrase) that current approaches to AI construction are unlikely to yield any serious form of unfriendliness (or any unfriendliness).
Even if one thinks it is highly likely that an AI+ or AI++ will be unfriendly, one could still be cautiously optimistic. One could think that with the right changes to the existing strategic infrastructure, safe AI+ becomes more likely. This, I presume, is what research centres like Oxford’s FHI, Cambridge’s CSER and the Singularity Institute would like to see happen, and helping to achieve it is a large part of their rationale for existence (though they do not, of course, ignore the other theses I have mentioned).
The decision tree associated with this thesis follows a similar pattern to that associated with the Explosion Thesis. The first node in the tree represents the choice between accepting (or not) the technical version of the Unfriendliness thesis. The other nodes trace out other possible combinations of views. The endpoints this time are given names. ‘Optimism’ is the name given to the view of those who reject the technical version of the Unfriendliness Thesis because they think that as intelligence goes up, so too does morality. “Deflation” is the name given to those who, although accepting the possibility of unfriendly AI, deem it highly unlikely, given current incentives (etc). The view is called “Deflation” because it seeks to burst the bubble of those who think that various doomsday scenarios are likely unless we make decisive changes to the strategic infrastructure. “Cautious Optimism” is the name given to those who think that unfriendly AI is probable given the current infrastructure, but avoidable if the right changes are made. And “Doom” is the name given to those who think unfriendly AI is unlikely to be avoided.
A final note: the binary representation of possible responses to the later questions in this decision tree is obviously flawed since there is a spectrum of probabilities one could attach to the various theses. Hopefully, the reader can make the necessary adjustments for themselves so that the decision tree better represents how they think about the probabilities.
So there it is, my revised version of the framework and overview. Once again, I’ve tried to organise the dialectical terrain around a few core theses. In this case there are two — the Explosion Thesis and the Unfriendliness Thesis — but each comes in two distinct versions — a technical version and a strategic version. One’s attitude toward the various theses can be represented as choices one makes on a decision tree, the endpoints of which constitute distinct and recognisable views about the technological singularity and its associated risks. My hope is that this improves on the previous version of the framework, but I accept it may not.