Saturday, September 21, 2019

Should we create artificial moral agents? A Critical Analysis

I recently encountered an interesting argument. It was given in the midst of one of those never-ending Twitter debates about the ethics of AI and robotics. I won’t say who made the argument (to be honest, I can’t remember) but the gist of it was that we shouldn’t create robots with ethical decision-making capacity. I found this intriguing because, on the face of it, it sounds like a near-impossible demand. My intuitive reaction was that any robot embedded in a social context, with a minimal degree of autonomous agency, would have to have some ethical decision-making capacity.

Twitter is not the best forum for debating these ideas. Neither the original argument nor my intuitive reaction to it was worked out in any great detail. But it got me thinking. I knew there was a growing literature on both the possibility and desirability of creating ethical robots (or ‘artificial moral agents’ - AMAs - as some people call them). So I decided to read around a bit. My reading eventually led me to an article by Amanda Sharkey called ‘Can we program or train robots to be good?’, which provided the inspiration for the remainder of what you are about to read.

Let me start by saying that this is a good article. In it, Sharkey presents an informative and detailed review of the existing literature on AMAs. If you want to get up to speed on the current thinking, I highly recommend it. But it doesn’t end there. Sharkey also defends her own views about the possibility and desirability of creating an AMA. In short, she argues that it is probably not possible and definitely not desirable. One of the chief virtues of Sharkey’s argumentative approach is that it focuses on existing work in robotics and not so much on speculative future technologies.

In what follows I want to critically analyse Sharkey’s main claims. I do so because, although I agree with some of what she has to say, I find that I am still fond of my intuitive reaction to the Twitter argument. As an exercise in self-education, I want to try to explain why.

1. What is an ethical robot?
A lot of the dispute about the possibility and desirability of creating an ethical robot hinges on what we think such a robot would look like (in the metaphorical sense of ‘look’). A robot can be defined, loosely, as any embodied artificial agent. This means that a robot is an artifact with some degree of actuating power (e.g. a mechanical arm) that it can use to change its environment in order to achieve a goal state. In doing this, it has some capacity to categorise and respond to environmental stimuli.

On my understanding, all robots also have some degree of autonomous decision-making capacity. What I mean is that they do not require direct human supervision and control in order to exercise all of their actuating power. In other words, they are not just remote controlled devices. They have some internal capacity to selectively sort environmental stimuli in order to determine whether or not a decision needs to be made. Nevertheless, the degree of autonomy can be quite minimal. Some robots can sort environmental stimuli into many different categories and can make many different decisions as a result, some can only sort stimuli into one or two categories and make only one type of decision.

What would make a robot, so defined, an ethical decision-maker? Sharkey reviews some of the work that has been done to date on this question, including in particular the work of Moor (2007), Wallach and Allen (2009) and Malle (2016). I think there is something to be learned from each of these authors, but since I don’t agree entirely with any of them, what I offer here is my own modification of their frameworks.

First, let me offer a minimal definition of what an ethical robot is: it is a robot that is capable of categorising and responding to ethically relevant variables in its environment with a view towards making decisions that humans would classify as ‘good’, ‘bad’, ‘permissible’, ‘forbidden’ etc. Second, following James Moor, let me draw a distinction between two kinds of ethical agency that such a robot could exhibit:

Implicit Ethical Agency: The agent identifies and acts upon ethically relevant variables (principles, norms, values etc) without explicitly representing, using or reporting on those variables, or without explicitly using ethical language to explain and justify its actions (Moor’s definition of this stipulates that an implied ethical agent has ethical considerations designed into its decision-making mechanisms).

Explicit Ethical Agency: The agent identifies and acts upon ethically relevant variables (principles, norms, values etc) and does explicitly represent, use and report on those variables, and may use ethical language to explain and justify its actions.

You can think of these two forms of ethical agency as defining a spectrum along which we can classify different ethical agents. At one extreme we have a simple implicit ethical agent that acts upon ethically relevant considerations but never explicitly relies upon those considerations in how it models, reports or justifies its choices. At the other extreme you have a sophisticated explicit ethical agent, who knows all about the different ethical variables affecting their choices and explicitly uses them to model, report and justify its choices.

Degrees of autonomy are also relevant to how we categorise ethical agents. The more autonomous an ethical agent is the more ethically relevant variables it will be able to recognise and act upon. So, for example, a simple implicit ethical agent, with low degrees of autonomy, may be able to act upon one or two ethically relevant considerations. For example, it may be able to sort stimuli into two categories — ‘harmful’ and ‘not harmful’ — and make one of two decisions in response — ‘approach’ or ‘avoid’. An implicit ethical agent with high degrees of autonomy would be to sort stimuli into many more categories: ‘painful’, ‘pleasurable’, ‘joyous’, ‘healthy’, ‘laughter-inducing’ and so on; and would also be able to make many more decisions.

The difference between an explicit ethical agent with low degrees of autonomy and one with high degrees of autonomy would be something similar. The crucial distinction between an implicit ethical agent and an explicit ethical agent is that the latter would explicitly rely upon the ethical concepts and principles to categorise, classify and sort between stimuli and decisions. The former would not and would only appear to us (or be intended by us) to be reacting to them. So, for example, an implicit ethical agent may appear to us (and be designed by us) to sort stimuli into categories like ‘harmful’ and ‘not harmful’, but it may do this by reacting to how hot or cold a stimulus is.

This probably seems very abstract so let’s make it more concrete. An example of a simple implicit ethical agent (used by Moor in his discussion of ethical agency) would be an ATM. An ATM has a very minimal degree of autonomy. It can sort and categorise one kind of environmental stimulus (buttons pressed on a numerical key pad) and make a handful of decisions in response to these categorisations: give user the option to withdraw money or see account balance (etc); dispense money/do not dispense money. In doing so, it displays some implicit ethical agency insofar as its choices imply judgments about property ownership and the distribution of money. An example of a sophisticated explicit ethical agent would be an adult human being. A morally normal adult can categorise environmental stimuli according to many different ethical principles and theories and make decisions accordingly.

In short, then what we have here is a minimal definition of ethical agency and a framework for classifying different degrees of ethical agency along two axes: the implied-explicit axis; and the autonomy axis. The figure below illustrates the idea.

You might find this distinction between implied and explicit ethical agency odd. You might say: “surely the only meaningful kind of ethical agency is explicit? That’s what we look for in morally healthy adults. Classifying implied ethical agents as ethical agents is both unnecessary and over-inclusive.” But I think that’s wrong. It it is worth bearing in mind that a lot of the ethical decisions made by adult humans are examples of implied ethical agency. Most of the time, we do not explicitly represent and act upon ethical principles and values. Indeed, if moral psychologists like Jonathan Haidt are correct, the explicit ethical agency that we prize so highly is, in fact, an epiphenomenon: a post-hoc rationalisation of our implied ethical agency. I’ll return to this idea later on.

Another issue that is worth addressing before moving on is the relationship between ethical agency and moral/legal responsibility. People often assume that agency goes hand-in-hand with responsibility. Indeed, according to some philosophical accounts, a moral agent must, by necessity, be a morally responsible agent. But, it should be clear from the foregoing, that ethical agency does not necessarily entail responsibility. Simple implied ethical agency, for instance, clearly does not entail responsibility. A simple implied ethical agent would not have the capacity for volition and understanding, both of which we expect of a responsible agent. Sophisticated explicit ethical agents are another matter. They probably are responsible agents, though they may have excuses for particular actions.

This distinction between agency and responsibility is important. It turns out that much of the opposition to creating an ethical robot stems from the perceived link between agency and responsibility. If you don't accept that link, much of the opposition to the idea of creating an artificial moral agent ebbs away.

2. Methods for Creating an Ethical Robot
Now that we are a bit clearer about what an ethical robot might look like, we can turn to the question of how to create them. As should be obvious, most of the action here has to do with how we might go about creating the sophisticated explicit ethical agents. After all, creating simple implied ethical agents is trivial: it just requires creating a robot with some capacity to sort and respond to stimuli along lines that we would call ethical. Sophisticated explicit ethical agents pose a more formidable engineering challenge.

Wallach and Allen (2009) argue that there are two ways of going about this:

Top-down method: You explicitly use an ethical theory to program and design the robot, e.g. hard-coding into the robot an ethical principle such as ‘do no harm’.

Bottom-up method: You create an environment in which the robot can explore different courses of action and be praised or criticised (rewarded/punished) for its choices in accordance with ethical theories. In this way, the robot might be expected to develop its own ethical sensitivity (like a child that acquires a moral sense over the course of its development).

As Sharkey notes, much of the work done to date on creating a sophisticated AMA has tended to be theoretical or conceptual in nature. Still, there are some intriguing practical demonstrations of the idea. Three stood out from her discussion:

Winfield et al 2014: Created a robot that was programmed to stop other robots (designated as proxy humans in the experiment) from entering a ‘hole’/dangerous area. The robot could assess the consequences of trajectories through the experimental environment in terms of the degree of risk/harm they posed to the ‘humans’ and would then have to make a choice as to what to do to mitigate the risk (including blocking the ‘humans’ or, even, sacrificing itself). Sometimes the robot was placed in a dilemma situation where it had to choose between one of two ‘humans’ to save. Winfield et al saw this as a minimal attempt to implement Asimov’s first law of robotics. The method here is clearly top-down.
Anderson and Anderson 2007: Created a medical ethics robot that could give advice to healthcare workers about what do when a patient had made a treatment decision. Should the worker accept the decision or try to get the patient to change its mind? Using the ‘principlism’ theory in medical ethics, the robot was trained on a set of case studies (classified as involving ethically correct decisions) and then used inductive logic programming to understand how the ethical principles work in these cases. It could then abstract new principles from these cases. The Andersons claimed that their robot induced a new ethical principle from this process. Initially, it might sound like this involves the bottom-up method but Sharkey classifies it as top-down because a specific ethical theory (namely: principlism) was used when programming and training the robot. The Andersons did similar experiments subsequent to this original one along the same lines.
Riedl and Harrison 2015: Reported an initial attempt to use machine learning to train an AI to align its values with those of humans by learning from stories. The idea was that the stories contained information about human moral norms and the AI could learn human morality from them. A model of ‘legal’ (or permissible) plot transitions was developed from the stories, and the AI was then rewarded or punished in an experimental environment, depending on whether it made a legal transition or not. This was preliminary study only but would be an example of the bottom-up method at work.

I am sure there are other studies out there that would be worth considering. If anyone knows of good/important ones please let me know in the comments. But assuming these studies are broadly representative of the kind of work that has been done to date, one thing becomes immediately clear: we are a long way from creating a sophisticated explicit ethical agent. Will we ever get there?

3. Is it possible to create an ethical robot?
One thing Sharkey says about this — which I tend to agree with — is that much of the debate about the possibility of creating a sophisticated explicit ethical robot seems to come down to different groups espousing different faith positions. Since we haven’t created one yet, we are forced to speculate about future possibilities and a lot of that speculation is not easy to assess. Some people feel strongly that it is possible to create such a robot; others feel strongly that it is not. These arguments are influenced, in turn, by how desirable this possibility is seen to be.

With this caveat in mind, Sharkey still offers her own argument for thinking that it is not possible to create an explicit ethical agent. I’ll quote some of the key passages from her presentation of this argument in full. After that, I’ll try to make sense of them. She starts with this:

One reason for being skeptical about the likelihood that non-living, non-biological machines could develop a sense of morality at some point in the future is their lack of a biological substrate. A case can be made for the grounding of morality in biology. 
(Sharkey 2017, p 8)

She then discusses the work of Patricia Churchland, which argues that the social emotions are key to human morality and that these emotions have a clear evolutionary history and bio-mechanical underpinning. This leads Sharkey to argue that:

Current robots, lacking living bodies, cannot feel pain, or even care about themselves, let alone extend that concern to others. How can they empathise with a human’s pain and distress if they are unable to experience either emotion? Similarly, without the ability to experience guilt or regret, how could they reflect on the effects of their actions, modify their behavior, and build their own moral framework? 
(Sharkey 2017, p 8)

She continues by discussing the work of other authors on the important link between the emotions, morality and biology.

So what argument is being made? At first, it might look like Sharkey is arguing that moral agency depends on biology, but I think that is a bit of a red herring. What she is arguing is that moral agency depends on emotions (particularly second personal emotions such as empathy, sympathy, shame, regret, anger, resentment etc). She then adds to this the assumption that you cannot have emotions without having a biological substrate. This suggests that Sharkey is making something like the following argument:

  • (1) You cannot have explicit moral agency without having second personal emotions.

  • (2) You cannot have second personal emotions without being constituted by a living biological substrate.

  • (3) Robots cannot be constituted by a living biological substrate.

  • (4) Therefore, robots cannot have explicit moral agency.

Assuming this is a fair reconstruction of the reasoning, I have some questions about it. First, taking premises (2) and (3) as a pair, I would query whether having a biological substrate really is essential for having second personal emotions. What is the necessary connection between biology and emotionality? This smacks of biological mysterianism or dualism to me, almost a throwback to the time when biologists thought that living creatures possessed some √©lan vital that separated them from the inanimate world. Modern biology and biochemistry casts all that into doubt. Living creatures are — admittedly extremely complicated — evolved biochemical machines. There is no essential and unbridgeable chasm between the living and the inanimate. The lines are fuzzy and gradual. Current robots may be much less sophisticated than biological machines, but they are still machines. It then just becomes a question of which aspects of biological form underlie second personal emotions, and which can be replicated in synthetic form. It is not obvious to me that robots could never bridge the gap.

Of course, all this assumes that you accept a scientific, materialist worldview. If you think there is more to humans than matter in motion, and that this ‘something more’ is what supports our rich emotional repertoire, then you might be able to argue that robots will never share that emotional repertoire. But in that case, the appeal to biology and the importance of a biological substrate will make no sense, and you will have to defend the merits of the non-materialistic view of humans more generally.

In any event, as a said previously, I think the discussion of biology is a red herring. What Sharkey really cares about is the suite of second personal emotions and the claim is that robots will never share those emotions. This is where premise (1) becomes important. There are two questions to ask about this premise: what do you need in order to have second personal emotions? And why is it that robots can never have this?

There are different theories of emotion out there. Some people would argue that in order to have emotions you have to have phenomenal consciousness. In other words, in order to be angry you have to feel angry; in order to be empathetic you have to feel what another person is feeling. There is ‘something it is like’ to have these emotions and until robots have this something, they cannot be said to be emotional. This seems to be the kind of argument Sharkey is making. Look back to the quoted passages above. She places a lot of emphasis on the capacity to feel the pain of another, to feel guilt and regret. This suggests that Sharkey’s argument against the possibility of a robotic moral agent really boils down to an argument against the possibility of phenomenal consciousness in robots. I cannot get into that debate in this article, but suffice to say there are plenty of people who argue that robots could be phenomenally conscious, and that the gap here is, once again, not as unbridgeable as is supposed. Indeed, there are people, such Roman Yampolskiy, who argue that robots may already be minimally phenomenally conscious and that there are ways to test for this. Many people will resist this thought, but I find Yampolskiy’s work intriguing because it cuts through a lot of the irresolvable philosophical conundrums about consciousness and tries to provide clear, operational and testable understandings of it.

There is, also, another way of understanding the emotions. Instead of being essentially phenomenal experiences they can be viewed as cognitive tools for appraising and evaluating what is happening in the world around the agent. When a gazelle sees a big, muscly predator stalking into their field of vision, their fear response is triggered. The fear is the brain’s way of telling them that the predator is a threat to their well-being and that they may need to run. In other words, the emotion of fear is a way of evaluating the stimulus that the gazelle sees and using this evaluating to guide behaviour. The same is true for all other emotions. They are just the brain’s way of assigning different weights and values to environmental stimuli and then filtering this forward into its decision-making processes. There is nothing in this account that necessitates feelings or experiences. The whole process could take place sub-consciously.

If we accept this cognitive theory of emotions, it becomes much less obvious why robots cannot have emotions. Indeed, it seems to me that if robots are to be autonomous agents at all, then they will have to have some emotions: they have to have some way of assigning weights and values to environmental stimuli. This is essential if they are going to make decisions that help them achieve their goal states. I don’t see any reason why these evaluations could not fall into the categories we typically associated with second personal emotions. This doesn’t mean that robots will feel the same things we feel when we have those emotions. After all, we don’t know if other humans feel the same things we feel. But robots could in principle act as if they share our emotional world and, as I have argued before, that acting ‘as if’ is enough.

Before I move on, I want to emphasise that this argument is about what is possible with robots, not what is actually the case. I’m pretty confident that present day robots do not share our emotional world and so do not rise to the level of sophisticated, explicit moral agents. My view would be similar to that of Yampolskiy’s with respect to phenomenal consciousness: present day robots probably have a minimal, limited form of cognitive emotionality and roboticists can build upon this foundation.

4. Should we create an ethical robot?
Even if it were possible, should we want to create robots with sophisticated ethical agency? Some people think we should. They argue that if we want robots to become more socially useful and integrated into our lives, then they will have to have improved moral agency. Human social life depends on moral agency and robots will not become integrated into human social life without it. Furthermore, there are some use cases — medical care, military, autonomous vehicles — where some form of ethical agency would seem to be a prerequisite for robots. In addition to this, people argue that we can refine and improve our understanding of morality by creating a robotic moral agent: the process will force us to clarify moral concepts and principles, and remove inconsistencies in our ethical thinking.

Sharkey is more doubtful. It is hard to decipher her exact argument, but she seems to make three key points. First, as a preliminary point, she agrees with other authors that it is dangerous to prematurely apply the language of ethical agency to robots because this tends to obscure human responsibility for the actions of robots:

Describing such machines as being moral, ethical, or human, risks increasing the tendency for humans to fail to acknowledge their ultimate responsibility for the actions of these artefacts…an important component to undertaking a responsible approach to the deployment of robots in sensitive areas is to avoid the careless application of words and terms used to describe human behaviour and decision-making. 
(Sharkey 2017, 9)

As I say, this is a preliminary point. It doesn’t really speak to the long-term desirability of creating robots with ethical agency, but it does suggest that it is dangerous to speak of this possibility prematurely, which is something that might be encouraged if we are trying to create such a robot. This highlights the point I made earlier about the link between concerns about ‘responsibility gaps’ and concerns about ethical agency in robots.

Sharkey then makes two more substantive arguments against the long-term desirability of robots with ethical agency. First, she argues that the scenarios in which we need competent ethical agents are ones in which “there is some ambiguity and a need for contextual understanding: situations in which judgment is required and there is not a single correct answer” (Sharkey 2017, 10). This implies that if we were to create robots with sophisticated ethical agency it would be with a view to deploying them in scenarios involving moral ambiguity. Second, she argues that we should not want robots to be dealing with these scenarios given that they currently lack the capacity to understand complex social situations and given that they are unlikely to acquire that capacity. She then continues by arguing that this rules robots out of a large number of social roles/tasks. The most obvious of these would be military robots or any other robots involved in making decisions about killing people, but it would also include other social-facing robots such a teaching robots, care robots and even bar-tending robots:

But how could a robot make appropriate decisions about when to praise a child, or when to restrict his or her activities, without a moral understanding? Similarly how could a robot provide good care for an older person without an understanding of their needs, and of the effects of its actions? Even a bar-tending robot might be placed in a situation in which decisions have to be made about who should or should not be served, and what is and is not acceptable behaviour. 
(Sharkey 2017, 11)

What can we make of this argument? Let me say three things by way of response.

First, if we accept Sharkey’s view then we have to accept that a lot of potential use cases for robots are off the table. In particular, we have to accept that most social robots — i.e. robots that are intended to be integrated into human social life — are ethically inappropriate. Sharkey claims that this is not the case. She claims that there would still be some ethically acceptable uses of robots in social settings. As an example, she cites an earlier paper of hers in which she argued that assistive robots for the elderly were okay, but care robots were not. But I think her argument is more extreme than she seems willing to accept. Most human social settings are suffused with elements of moral ambiguity. Even the use of an assistive robot — if it has some degree of autonomy — will have the potential to butt up against cases in which a capacity to navigate competing ethical demands might be essential. This is because human morality is replete with vague and sometimes contradictory principles. Consider her own example of the bar-tending robot. What she seems to be suggesting with this example is that you ought not to have a robot that just serves people as much alcohol as they like. Sometimes, to both protect themselves and others, people should not be served alcohol. But, of course, this is true for any kind of assistance a robot might provide to a human. People don’t always want what is morally best for themselves. Sometimes there will be a need to judge when it is appropriate to give assistance and when it is not. I cannot imagine an interaction with a human that would not, occasionally, have features like this. This implies that Sharkey’s injunction, if taken seriously, could be quite restrictive.

People may be willing to pay the price and accept those restrictions on the use of robots, but this then brings me to the second point. Sharkey’s argument hinges on the premise that we want moral agents to be sensitive to moral ambiguities and have the capacity to identify and weigh competing moral interests. The concern is that robots will be too simplistic in their moral judgments and lack the requisite moral sensitivity. But it could be that humans are too sensitive to the moral ambiguities of life and, as a consequence, too erratic and flexible with their moral judgments. For example, when making decisions about how to distribute social goods, there is a tendency to get bogged down in all the different moral variables and interests at play, and then struggle to balance those interests effectively when making decisions. When making choices about healthcare, for instance, which rules should we follow: should we give to most needy? What defines those with the most need? What is a healthcare need in the first place? Should we force people to get insurance and refuse to treat those without? Should those who are responsible for their own ill-health be pushed down the order of priority when receiving treatment? If you take all these interests seriously, you can easily end up in a state of moral paralysis. Robots, with their greater simplicity and stricter rule-following behaviour, might be beneficial because they can cut through the moral noise. This, as I understand it, is one of the main arguments in favour of autonomous vehicles: they are faster at responding to some environmental stimuli but also stricter in how they follow certain rules of the road; this can make them safer, and less erratic, than human drivers. This remains to be seen, of course. We need a lot more testing of these vehicles before we can become reasonably confident of their greater safety, but it seems to me that there is a prima facie case that warrants this testing. I suspect this is true across many other possible use cases for robots too.

Third, and finally, I want to return to my original argument — the intuition that started this article — about the unavoidability of ethical robot agents. Even if we accept Sharkey’s view that we shouldn’t create sophisticated explicit ethical agents, it seems to me that if we are going to create robots at all, we will still have to create implicit ethical agents and hence confront many of the same design choices that would go into the design of an explicit ethical agent. The reasoning flows from what was said previously. Any interaction a robot has with a human is going to be suffused with moral considerations. There is no getting away from this: these considerations constitute the invisible framework of our social lives. If a robot is going to work autonomously within that framework, then it will have to have some capacity to identify and respond to (at least some of) those considerations. This may not mean that they explicitly identify and represent ethical principles in their decision-making, but they will need to do so implicitly.* This might sound odd but then recall the point I made previously: that according to some moral psychologists this is essentially how human moral agency functions: the explicit stuff comes after our emotional and subconscious mind has already made its moral choices. You could get around this by not creating robots with autonomous decision-making capacity. But I would argue that, in that case, you are not really creating a robot at all: you are creating a remote controlled tool.

* This is true even if the robot is programmed to act in an unethical way. In that case the implicit ethical agency contradicts or ignores moral considerations. This still requires some implicit capacity to exercise moral judgment with respect to the environmental stimuli.


  1. Great post John. I only just speed read, but will read more slowly later. Did you know that I and others recently guest edited a special issue of PIEEE on machine ethics? See

    And please check our editorial article: We conclude (like you) that implicitly ethical machine are unavoidable and necessary, but we are *very* cautious about explicit ethical machines.

    1. Thanks for this Alan. I had come across a couple of the papers before but I didn't realise there was a whole special issue dedicated to the topic. Thanks for sharing the link.

  2. Joanna Bryson asked me to post this comment on her behalf:

    "Thanks for being clear about your definitions. I agree with you that by this definition of moral agency all AI actors are moral agents. Basically, all actions of our artefacts have impact on our society (if nothing else via ourselves and our resources), and therefore moral consequences. But this therefore seems a less useful definition than “the individuals a society holds as responsible” which I use in my 2018 article Patiency Is Not a Virtue There I argue as you do that by that definition making machines responsible is kind of useless. Maybe our opinions are converging, or maybe one or both of us is just getting clearer and it turns out they always were similar."

  3. I have been following this topic for some time and have come to somewhat different conclusions. Take the case of self-driving cars. Many people have tried to work out what the appropriate behaviour should be when faced with trolley problems. In these cases, the self-driving car does not have full control over the movement of the car, which can cause serious damage or death. Trolley problems present problems where decisions must be made about which direction to take. These decisions are chosen based on some ethical principle and can result in the injury and death of several people. What is forgotten is that self-driving cars also operate in a legal world where the possibility of owner culpability arises. It seems to me that self-driving cars should opt to do nothing, given current legal environments, thus avoiding the issue of culpability.