Monday, August 18, 2014
Bostrom on Superintelligence (6): Motivation Selection Methods
This is the sixth part in my series on Nick Bostrom’s recent book Superintelligence: Paths, Dangers, Strategies. The series is covering those parts of the book that most interest me. This includes the sections setting out the basic argument for thinking that the creation of superintelligent AI could threaten human existence, and the proposed methods for dealing with that threat.
I’m currently working through chapter 9 of the book. In this chapter, Bostrom describes the “Control Problem”. This is the problem that human engineers and developers have when they create a superintelligence (or, rather, when they create the precursor to a superintelligence). Those engineers and developrs will want the AI to behave in a manner that is consistent with (perhaps even supportive of) human flourishing. But how can they ensure that this is the case?
As noted the last day, there are two general methods of doing this. The first is to adopt some form of “capability control”, i.e. limit the intelligence’s capabilities so that, whatever its motivations might be, it cannot pose a threat to human beings. We reviewed various forms of capability control in the previous post. The second is to adopt some form of “motivation selection”, i.e. ensure that the AI has benevolent (or non-threatening) motivations. We’ll look at this today.
In doing so, we’ll have to contend with four possible forms of motivation selection. They are: (i) direct specification; (ii) domestication; (iii) indirect normativity; and (iv) augmentation. I’ll explain each and consider possible advantages and disadvantages.
1. Direct Specification
The direct specification method — as name suggests — involves directly programming the AI with the “right” set of motivations. The quintessential example of this is Isaac Asimov’s three (or four!) laws of robotics, from his “Robot” series of books and short stories. As you may know, in these stories Asimov imagined a future in which robots are created and programmed to follow a certain set of basic moral laws. The first one being “A robot may not injure a human being or allow, through inaction, a human being to come to harm”. The second one being “A robot must obey any orders given to it by human beings, except where such orders would conflict with the First Law”. And so on (I won’t go through all of them).
At first glance, laws of this sort seem sensible. What could go wrong if a robot was programmed to always follow Asimov’s first law? Of course, anyone who has read the books will know that lots can go wrong. Laws and rules of this sort are vague, open to interpretation. In specific contexts they could be applied in very odd ways, especially if the robot has a very logical or literalistic mind. Take the first law as an example. It says that a robot may not, through inaction, allow any human to come to harm. This implies that the robot must be at all times seeking to avoid possible ways in which humans could come to harm. But humans come to harm all the time. How can we stop it? A superintelligent robot, with a decisive advantage over human beings, might decide that the safest thing to do would be to put all humans into artificially induced comas. It wouldn’t be great for them, but it would prevent them from coming to harm.
Now, you may object and say that this is silly. We could specify the meaning of “harm” in such a way that it can avoid the induced coma outcome. And maybe we could, but as Bostrom points out, quoting from Bertrand Russell, “everything is vague to a degree you do not realize till you have tried to make it precise”. In other words, adding one exception-clause or one-degree of specification doesn’t help to avoid other possible problems with vagueness. Anyone who has studied the development and application of human laws will be familiar with this problem. The drafters of those laws can never fully anticipate every possible future application: the same will be true for AI programmers and coders.
There is, in fact, a more robust argument to made here. I articulated it last year in one of my posts on AI-risk. I called it the “counterexample problem”, and based it on an argument from Muehlhauser and Helm. I’ll just give the gist of it here. The idea behind the direct specification method is that programming intelligences to follow moral laws and moral rules will ensure a good outcome for human beings. But every moral law and rule that we know of is prone to defeating counterexamples, i.e. specific contextual applications of the rule that lead to highly immoral and problematic outcomes. Think of classic counterexample to consequentialism which suggest that following that moral system could lead someone to kill one person in order to harvest his/her organs for five needy patients. Humans usually recoil from such outcomes because of shared intuitions or background beliefs. But how can we ensure that an AI will do the same? It may be free from our encumbrances and inhibitions: it may be inclined to kill the one to save the five. Since all moral theories are subject to the same counterexample problem,
I should note that Richard Loosemore has recently penned a critique of this problem, but I have not yet had the time to grapple with it.
The second suggested method of motivation selection is called “domesticity”. The analogy here might be with the domestication of wild animals. Dogs and cats have been successfully domesticated and tamed from wild animals over the course of many generations; some wild animals can be successfully domesticated over the course of their lifespan (people claim this for all sorts of creatures though we can certainly doubt whether it is really true of some animals, e.g. tigers). Domestication means that the animals are trained (or bred) to lack the drive or motivation to do anything that might harm their human owners: they are happy to operate within the domestic environment and their behaviour can be controlled in that environment.
The suggestion is that something similar could be done with artificial intelligences. They could be domesticated. As Bostrom puts it, “it seems extremely difficult to specify how one would want a superintelligence to behave in the world in general — since this would require us to account for all the trade offs in all the situations that could arise — it might be feasible to specify how a superintelligence should behave in one particular situation. We could therefore seek to motivate the system to confine itself to acting on a small scale, within a narrow context, and through a limited set of action modes.” (Bostrom 2014, p. 141)
The classic example of a domesticated superintelligence would be the so-called “oracle” device. This functions as a simple question-answering system. Its final goal is to produce correct answers to any questions it is asked. It would usually do so from within a confined environment (a “box”). This would make it domesticated, in a sense, since it would be happy to work in a constrained way within a confined environment.
But, of course, things are not so simple as that. Even giving an AI the seemingly benign goal of giving correct answers to questions could have startling implications. Anyone who has read the Hitchhiker’s Guide to the Galaxy knows that. In that story, the planet earth is revealed to be a supercomputer created by an oracle AI in order to formulate the “ultimate question” — the meaning of life, the universe and everything — having earlier worked out the answer (“42”). The example is silly, but it highlights the problem of “resource acquisition”, which was mentioned earlier in this series: making sure one has the correct answers to questions could entail the acquisition of vast quantities of resources.
There might be ways around this, and indeed Bostrom dedicates a later chapter to addressing the possibility of an oracle AI. Nevertheless, there is a basic worry about the domestication strategy that needs to be stated: the AI’s understanding of what counts as a minimised and constrained area of impact needs to be aligned with our own. This presents a significant engineering challenge.
3. Indirect Normativity and augmentation
The third possible method of motivation selection is indirect normativity. The idea here is that instead of directly programming ethical or moral standards into the AI, you give it some procedure for determining its own ethical and moral standards. If you get the procedure just right, the AI might turn out to be benevolent and perhaps even supportive of human interests and needs. Popular candidates for such a procedure tend to be modelled on something like the ideal observer theory in ethics. The AI is to function much like an ideal, hyper-rational human being who can “achieve that which we would have wished the AI to achieve if we had thought about the matter long and hard” (Bostrom, 2014, p. 141).
Bostrom doesn’t say a whole lot about this in chapter 9, postponing a fuller discussion to a later chapter. But as I noted in one of my previous posts on this topic, one of the main problems with this method of motivation selection is ensuring you’ve got the right norm-picking procedure. Getting it slightly wrong could have devastating implications, particularly if the machine has a decisive strategic advantage over us.
The fourth and final method of motivation selection is augmentation. This is quite different from the methods discussed thus far. They were all imagining an artificial intelligence that would be designed from scratch. This imagines that we start with a system that has the “right” motivations and we amp-up its intelligence from there. The obvious candidate for such a system would be a human being (or group of human beings). We could simply take their brains, with their evolved and learned motivations, and augment their capabilities until we reach a point of superintelligence. (Ignore, for now, the ethics of doing this.)
As Bostrom notes, augmentation might look pretty attractive if all other methods turn out to be too difficult to implement. Furthermore, it might end up being a “forced choice”. If augmentation is the only route to superintelligence, then augmentation is, by default, the only available method of motivation selection. Contrariwise, if the route to superintelligence is via the development of AI, augmentation is not on the cards.
In any event, as a “solution” to the control problem, augmentation leaves a lot to be desired. If the system we augment has some inherent biases or flaws, we may simply end up exaggerating those flaws through a series of augments. It might be wonderful to augment a Florence Nightingale to superintelligence, but it might be nightmarish to do the same with a Hitler. Furthermore, even if the starter-system is benevolent and non-threatening, the process of augmentation could have a corrupting effect. A super-rational, super-intelligent human being, for instance, might end up being an anti-natalist and might decide that human annihilation is the morally best outcome.
Okay, so that it’s for this post. The table below summarises the various motivation selection methods. This might be it for my series on Bostrom’s book. If I have the time, I may do two more on chapter 10, but that’s looking less viable every day.