Tuesday, October 15, 2019

Assessing the Moral Status of Robots: A Shorter Defence of Ethical Behaviourism






[This is the text of a lecture that I delivered at Tilburg University on the 24th of September 2019. It was delivered as part of the 25th Anniversary celebrations for TILT (Tilburg Institute for Law, Technology and Society). My friend and colleague Sven Nyholm was the discussant for the evening. The lecture is based on my longer academic article ‘Welcoming Robots into the Moral Circle: A Defence of Ethical Behaviourism’ but was written from scratch and presents some key arguments in a snappier and clearer form. I also include a follow up section responding to criticisms from the audience on the evening of the lecture. My thanks to all those involved in organizing the event (Aviva de Groot, Merel Noorman and Silvia de Conca in particular). You can download an audio version of this lecture, minus the reflections and follow ups, here or listen to it above]


1. Introduction
My lecture this evening will be about the conditions under which we should welcome robots into our moral communities. Whenever I talk about this, I am struck by how much my academic career has come to depend upon my misspent youth for its inspiration. Like many others, I was obsessed with science fiction as a child, and in particular with the representation of robots in science fiction. I had two favourite, fictional, robots. The first was R2D2 from the original Star Wars trilogy. The second was Commander Data from Star Trek: the Next Generation. I liked R2D2 because of his* personality - courageous, playful, disdainful of authority - and I liked Data because the writers of Star Trek used him as a vehicle for exploring some important philosophical questions about emotion, humour, and what it means to be human.

In fact, I have to confess that Data has had an outsized influence on my philosophical imagination and has featured in several of my academic papers. Part of the reason for this was practical. When I grew up in Ireland we didn’t have many options to choose from when it came to TV. We had to make do with what was available and, as luck would have it, Star Trek: TNG was on every day when I came home from school. As a result, I must have watched each episode of its 7-season run multiple times.

One episode in particular has always stayed with me. It was called ‘Measure of a Man’. In it, a scientist from the Federation visits the Enterprise because he wants to take Data back to his lab to study him. Data, you see, is a sophisticated human-like android, created by a lone scientific genius, under somewhat dubious conditions. The Federation scientist wants to take Data apart and see how he works with a view to building others like him. Data, unsurprisingly, objects. He argues that he is not just a machine or piece of property that can be traded and disassembled to suit the whims of human beings. He has his own, independent moral standing. He deserves to be treated with dignity.

But how does Data prove his case? A trial ensues and evidence is given on both sides. The prosecution argue that Data is clearly just a piece of property. He was created not born. He doesn’t think or see the world like a normal human being (or, indeed, other alien species). He even has an ‘off switch’. Data counters by giving evidence of the rich relationships he has formed with his fellow crew members and eliciting testimony from others regarding his behaviour and the interactions they have with him. Ultimately, he wins the case. The court accepts that he has moral standing.

Now, we can certainly lament the impact that science fiction has on the philosophical debate about robots. As David Gunkel observes in his 2018 book Robot Rights:

[S]cience fiction already — and well in advance of actual engineering practice — has established expectations for what a robot is or can be. Even before engineers have sought to develop working prototypes, writers, artists, and filmmakers have imagined what robots do or can do, what configurations they might take, and what problems they could produce for human individuals and communities.”  
(Gunkel 2018, 16)

He continues, noting that this is a “potential liability” because:

“science fiction, it is argued, often produces unrealistic expectations for and irrational fears about robots that are not grounded in or informed by actual science.” 
(Gunkel 2018, 18)

I certainly heed this warning. But, nevertheless, I think the approach taken by the TNG writers in the episode ‘Measure of a Man’ is fundamentally correct. Even if we cannot currently create a being like Data, and even if the speculation is well in advance of the science, they still give us the correct guide to resolving the philosophical question of when to welcome robots into our moral community. Or so, at least, I shall argue in the remainder of this lecture.


2. Tribalism and Conflict in Robot Ethics
Before I get into my own argument, let me say something about the current lay of the land when it comes to this issue. Some of you might be familiar with the famous study by the social psychologist Muzafer Sherif. It was done in the early 1950s at a summer camp in Robber’s Cave, Oklahoma. Suffice to say, it is one of those studies that wouldn’t get ethics approval nowadays. Sherif and his colleagues were interested in tribalism and conflict. They wanted to see how easy it would be to get two groups of 11-year old boys to divide into separate tribes and go to war with one another. It turned out to be surprisingly easy. By arbitrarily separating the boys into two groups, giving them nominal group identity (the ‘Rattlers’ and the ‘Eagles’), and putting them into competition with each other, Sherif and his research assistants sowed the seeds for bitter and repeated conflict.

The study has become a classic, repeatedly cited as evidence of how easy it is for humans to get trapped in intransigent group conflicts. I mention it here because, unfortunately, it seems to capture what has happened with the debate about the potential moral standing of robots. The disputants have settled into two tribes. There are those that are ‘anti’ the idea; and there are those that are ‘pro’ the idea. The members of these tribes sometimes get into heated arguments with one another, particularly on Twitter (which, admittedly, is a bit like a digital equivalent of Sherif’s summer camp).

Those that are ‘anti’ the idea would include Noel Sharkey, Amanda Sharkey, Deborah Johnson, Aimee van Wynsberghe and the most recent lecturer in this series, Joanna Bryson. They cite a variety of reasons for their opposition. The Sharkeys, I suspect, think the whole debate is slightly ridiculous because current robots clearly lack the capacity for moral standing, and debating their moral standing distracts from the important issues in robot ethics - namely stopping the creation and use of robots that are harmful to human well-being. Deborah Johnson would argue that since robots can never experience pain or suffering they will never have moral standing. Van Wynsberghe and Bryson are maybe a little different and lean more heavily on the idea that even if it were possible to create robots with moral standing — a possibility that Bryson at least is willing to concede — it would be a very bad idea to do so because it would cause considerable moral and legal disruption.

Those that are pro the idea would include Kate Darling, Mark Coeckelbergh, David Gunkel, Erica Neely, and Daniel Estrada. Again, they cite a variety of reasons for their views. Darling is probably the weakest on the pro side. She focuses on humans and thinks that even if robots themselves lack moral standing we should treat them as if they had moral standing because that would be better for us. Coeckelbergh and Gunkel are more provocative, arguing that in settling questions of moral standing we should focus less on the intrinsic capacities of robots and more on how we relate to them. If those relations are thick and meaningful, then perhaps we should accept that robots have moral standing. Erica Neely proceeds from a principle of moral precaution, arguing that even if we are unsure of the moral standing of robots we should err on the side of over-inclusivity rather than under-inclusivity when it comes to this issue: it is much worse to exclude a being with moral standing to include one without. Estrada is almost the polar opposite of Bryson, welcoming the moral and legal disruption that embracing robots would entail because it would loosen the stranglehold of humanism on our ethical code.

To be clear, this is just a small sample of those who have expressed an opinion about this topic. There are many others that I just don’t have time to discuss. I should, however, say something here about this evening’s discussant, Sven and his views on the matter. I had the fortune of reading a manuscript of Sven’s forthcoming book Humans, Robots and Ethics. It is an excellent and entertaining contribution to the field of robot ethics and in it Sven shares his own views on the moral standing of robots. I’m sure he will explain them later on but, for the time being, I would tentatively place him somewhere near Kate Darling on this map: he thinks we should be open to the idea of treating robots as if they had moral standing, but not because of what the robots themselves are but because of what respecting them says about our attitudes to other humans.

And what of myself? Where do I fit in all of this? People would probably classify me as belonging to the pro side. I have argued that we should be open to the idea that robots have moral standing. But I would much prefer to transcend this tribalistic approach to the issue. I am not advocate for the moral standing of robots. I think many of the concerns raised by those on the anti side are valid. Debating the moral standing of robots can seem, at times, ridiculous and a distraction from other important questions in robot ethics; and accepting them into our moral communities will, undoubtedly, lead to some legal and moral disruption (though I would add that not all disruption is a bad thing). That said, I do care about the principles we should use to decide questions of moral standing, and I think that those on the anti of the debate sometimes use bad arguments to support their views. This is why, in the remainder of this lecture, I will defend a particular approach to settling the question of the moral standing of robots. I do so in the hope that this can pave the way to a more fruitful and less tribalistic debate.

In this sense, I am trying to return to what may be the true lesson of Sherif’s famous experiment on tribalism. In her fascinating book The Lost Boys: Inside Muzafer Sherif’s Robbers Cave Experiment, Gina Perry has revealed the hidden history behind Sherif’s work. It turns out that Sherif tried to conduct the exact same experiment as he did in Robber’s Cave one year before in Middle Grove, New York. It didn’t work out. No matter what the experimenters did to encourage conflict, the boys refused to get sucked into it. Why was this? One suggestion is that at Middle Grove, Sherif didn’t sort the boys into two arbitrary groups as soon as they arrived. They were given the chance to mingle and get to know one another before being segregated. This initial intermingling may have inoculated them from tribalism. Perhaps we can do the same thing with philosophical dialogue? I live in hope.


3. In Defence of Ethical Behaviourism
The position I wish to defend is something I call ‘ethical behaviourism’. According to this view, the behavioural representations of another entity toward you are a sufficient ground for determining their moral status. Or, to put it slightly differently, how an entity looks and acts is enough to determine its moral status. If it looks and acts like a duck, then you should probably treat it like you treat any other duck.

Ethical behaviourism works through comparisons. If you are unsure of the moral status of a particular entity — for present purposes this will be a robot but it should be noted that ethical behaviourism has broader implications — then you should compare its behaviours to that of another entity that is already agreed to have moral status — a human or an animal. If the robot is roughly performatively equivalent to that other entity, then it too has moral status. I say “roughly” since no two entities are ever perfectly equivalent. If you compared two adult human beings you would spot performative differences between them, but this wouldn’t mean that one of them lacks moral standing as a result. The equivalence test is an inexact one, not an exact one.

There is nothing novel in ethical behaviourism. It is, in effect, just a moral variation of the famous Turing Test for machine intelligence. Where Turing argued that we should assess intelligence on the basis of behaviour, I am arguing that we should determine moral standing on the basis of behaviour. It is also not a view that is original to me. Others have defended similar views, even if they haven’t explicitly labelled it as such.

Despite the lack of novelty, ethical behaviourism is easily misunderstood and frequently derided. So let me just clarify a couple of points. First, note that it is a practical and epistemic thesis about how we can settle questions of moral standing; it is not an abstract metaphysical thesis about what it is that grounds moral standing. So, for example, someone could argue that the capacity to feel pain is the metaphysical grounding for moral status and that this capacity depends on having a certain mental apparatus. The ethical behaviourist can agree with this. They will just argue that the best evidence we have for determining whether an entity has the capacity to feel pain is behavioural. Furthermore, ethical behaviourism is agnostic about the broader consequences of its comparative tests. To say that one entity should have the same moral standing as another entity does not mean both are entitled to a full set of legal and moral rights. That depends on other considerations. A goat could have moral standing, but that doesn’t mean it has the right to own property. This is important because when I am arguing that we should apply this approach to robots and I am not thereby endorsing a broader claim that we should grant robots legal rights or treat them like adult human beings. This depends on who or what the robots is being compared to.

So what’s the argument for ethical behaviourism? I have offered different formulations of this but for this evening’s lecture I suggest that it consists of three key propositions or premises.


  • (P1) The most popular criteria for moral status are dependent on mental states or capacities, e.g. theories focused on sentience, consciousness, having interests, agency, and personhood.

  • (P2) The best evidence — and oftentimes the only practicable evidence — for the satisfaction of these criteria is behavioural.

  • (P3) Alternative alleged grounds of moral status or criteria for determining moral status either fail to trump or dislodge the sufficiency of the behavioural evidence.


Therefore, ethical behaviourism is correct: behaviour provides a sufficient basis for settling questions of moral status.

I take it that the first premise of this argument is uncontroversial. Even if you think there are other grounds for moral status, I suspect you agree that an entity with sentience or consciousness (etc) has some kind of moral standing. The second premise is more controversial but is, I think, undeniable. It’s a trite observation but I will make it anyway: We don’t have direct access to one another’s minds. I cannot crawl inside your head and see if you really are experiencing pain or suffering. The only thing I have to go on is how you behave and react to the world. This is true, by the way, even if I can scan your brain and see whether the pain-perceiving part of it lights up. This is because the only basis we have for verifying the correlations between functional activity in the brain and mental states is behavioural. What I mean is that scientists ultimately verify those correlations by asking people in the brain scanners what they are feeling. So all premise (2) is saying is that if the most popular theories of moral status are to work in practice, it can only be because we use behavioural evidence to guide their application.

That brings us to premise (3): that all other criteria fail to dislodge the importance of behavioural evidence. This is the most controversial one. Many people seem to passionately believe that there are other ways of determining moral status and indeed they argue that relying on behavioural evidence would be absurd. Consider these two recent Twitter comments on an article I wrote about ethical behaviourism and how it relates to animals and robots:

First comment: “[This is] Errant #behaviorist #materialist nonsense…Robots are inanimate even if they imitate animal behavior. They don’t want or care about anything. But knock yourself out. Put your toaster in jail if it burns your toast.”

Second comment: “If I give a hammer a friendly face so some people feel emotionally attached to it, it still remains a tool #AnthropomorphicFallacy”



These are strong statements, but they are not unusual. I encounter this kind of criticism quite frequently. But why? Why are people so resistant to ethical behaviourism? Why do they think that there must be something more to how we determine moral status? Let’s consider some of the most popular objections.


4. Objections and Replies
In a recent paper, I suggested that there were seven (more, depending on how you count) major objections to ethical behaviourism. I won’t review all seven here, but I will consider four of the most popular ones. Each of these objections should be understood as an attempt to argue that behavioural evidence by itself cannot suffice for determining moral standing. Other evidence matters as well and can ‘defeat’ the behavioural evidence.


(A) The Material Cause Objection
The first objection is that the ontology of an entity makes a difference to its moral standing. To adopt the Aristotelian language, we can say that the material cause of an entity (i.e. what it is made up of) matters more than behaviour when it comes to moral standing. So, for example, someone could argue that robots lack moral standing because they are not biological creatures. They are not made from the same ‘wet’ organic components as human beings or animals. Even if they are performatively equivalent to human beings or animals, this ontological difference scuppers any claim they might have to moral standing.

I find this objection unpersuasive. It smacks to me of biological mysterianism. Why exactly does being made of particular organic material make such a crucial difference? Imagine if your spouse, the person you live with everyday, was suddenly revealed to be an alien from the Andromeda galaxy. Scientists conduct careful tests and determine that they are not a carbon-based lifeform. They are made from something different, perhaps silicon. Despite this, they still look and act in the same way as they always have (albeit now with some explaining to do). Would the fact that they are made of different stuff mean that they no longer warrant any moral standing in your eyes? Surely not. Surely the behavioural evidence suggesting that they still care about you and still have the mental capacities you used to associate with moral standing would trump the new evidence you have regarding their ontology. I know non-philosophers dislike thought experiments of this sort, finding them to be slightly ridiculous and far-fetched. Nevertheless, I do think they are vital in this context because they suggest that behaviour does all the heavy lifting when it comes to assessing moral standing. In other words, behaviour matters more than matter. This is also, incidentally, one reason why it is wrong to say that ethical behaviourism is a ‘materialist’ view: ethical behaviourism is actually agnostic regarding the ontological instantiation of the capacities that ground moral status; it is concerned only with the evidence that is sufficient for determining their presence.

All that said, I am willing to make one major concession to the material cause objection. I will concede that ontology might provide an alternative, independent ground for determining the moral status of an entity. Thus, we might accept that an entity that is made from the right biological stuff has moral standing, even if they lack the behavioural sophistication we usually require for moral standing. So, for example someone in a permanent coma might have moral standing because of what they are made of, and not because of what they can do. Still, all this shows is that being made of the right stuff is an independent sufficient ground for moral standing, not that it is a necessary ground for moral standing. The latter is what would need to be proved to undermine ethical behaviourism.


(B) The Efficient Cause Objection
The second objection is that how an entity comes into existence makes a difference to its moral standing. To continue the Aristotelian theme, we can say that the efficient cause of existence is more important than the unfolding reality. This is an objection that the philosopher Michael Hauskeller hints at in his work. Hauskeller doesn’t focus on moral standing per se, but does focus on when we can be confident that another entity cares for us or loves us. He concedes that behaviour seems like the most important thing when addressing this issue — what else could caring be apart from caring behaviour? — but then resiles from this by arguing that how the being came into existence can undercut the behavioural evidence. So, for example, a robot might act as if it cares about you, but when you learn that the robot was created and manufactured by a team of humans to act as if it cares for you, then you have reason to doubt the sincerity of its behaviour.

It could be that what Hauskeller is getting at here is that behavioural evidence can often be deceptive and misleading. If so, I will deal with this concern in a moment. But it could also be that he thinks that the mere fact that a robot was programmed and manufactured, as opposed to being evolved and developed, makes a crucial difference to moral standing. If that is what he is claiming, then it is hard to see why we should take it seriously. Again, imagine if your spouse told you that they were not conceived and raised in the normal way. They were genetically engineered in a lab and then carefully trained and educated. Having learned this, would you take a new view of their moral standing? Surely not. Surely, once again, how they actually behave towards you — and not how they came into existence — would be what ultimately mattered. We didn’t deny the first in vitro baby moral standing simply because she came into existence in a different way from ordinary human beings. The same principle should apply to robots.

Furthermore, if this is what Hauskeller is arguing, it would provide us with an unstable basis on which to make crucial judgments of moral standing. After all, the differences between humans and robots with respect to their efficient causes is starting to breakdown. Increasingly, robots are not being programmed and manufactured from the top-down to follow specific rules. They are instead given learning algorithms and then trained on different datasets with the process sometimes being explicitly modeled on evolution and childhood development. Similarly, humans are increasingly being designed and programmed from the top down, through artificial reproduction, embryo selection and, soon, genetic engineering. You may object to all this tinkering with the natural processes of human development and conception. But I think you would be hard pressed to deny a human that came into existence as a result of these process the moral standing you ordinarily give to other human beings.


(C) The Final Cause Objection
The third objection is that the purposes an entity serves and how it is expected to fulfil those purposes makes a difference to its moral standing. This is an objection that Joanna Bryson favours in her work. In several papers, she has argued that because robots will be designed to fulfil certain purposes on our behalf (i.e. they will be designed to serve us) and because they will be owned and controlled by us in the process, they should not have moral standing. Now, to be fair, Bryson is more open to the possibility of robot moral standing than most. She has said, on several occasions, that it is possible to create robots that have moral standing. She just thinks that that this should not happen, in part because they will be owned and controlled by us, and because they will be (and perhaps should be) designed to serve our ends.

I don’t think there is anything in this that dislodges or upsets ethical behaviourism. For one thing, I find it hard to believe that the fact that an entity has been designed to fulfil a certain purpose should make a crucial difference to its moral standing. Suppose, in the future, human parents can genetically engineer their offspring to fulfil certain specific ends. For example, they can select genes that will guarantee (with the right training regime) that their child will be a successful athlete (this is actually not that dissimilar to what some parents try to do nowadays). Suppose they succeed. Would this fact alone undermine the child’s claim to moral standing? Surely not, and surely the same standard should apply to a robot. If it is performatively equivalent to another entity with moral standing, then the mere fact that it has been designed to fulfil a specific purpose should not affect its moral standing.

Related to this, it is hard to see why the fact that we might own and control robots should make a critical difference to their moral standing. If anything, this inverts the proper order of moral justification. The fact that a robot looks and acts like another entity that we believe to have moral standing should cause us to question our approach to ownership and control, not vice versa. We once thought it was okay for humans to own and control other humans. We were wrong to think this because it ignored the moral standing of those other humans.

That said, there are nuances here. Many people think that animals have some moral standing (i.e. that we need to respect their welfare and well-being) but that it is not wrong to own them or attempt to control them. The same approach might apply to robots if they are being compared to animals. This is the crucial point about ethical behaviourism: the ethical consequences of accepting that a robot is performatively equivalent to another entity with moral standing depends, crucially, on who or what that other entity is.


(D) The Deception Objection
The fourth objection is that ethical behaviourism cannot work because it is too easy to be deceived by behavioural cues. A robot might look and act like it is in pain, but this could just be a clever trick, used by its manufacturer, to foster false sympathy. This is, probably, the most important criticism of ethical behaviourism. It is what I think lurks behind the claim that ethical behaviourism is absurd and must be resisted.

It is well-known that humans have a tendency toward hasty anthropomorphism. That is, we tend to ascribe human-like qualities to features of our environment without proper justification. We anthropomorphise the weather, our computers, the trees and the plants, and so forth. It is easy to ‘hack’ this tendency toward hasty anthropomorphism. As social roboticists know, putting a pair of eyes on a robot can completely change how a human interacts with it, even if the robot cannot see anything. People worry, consequently, that ethical behaviourism is easily exploited by nefarious technology companies.

I sympathise with the fear that motivates this objection. It is definitely true that behaviour can be misleading or deceptive. We are often misled by the behaviour of our fellow humans. To quote Shakespeare, someone can ‘smile and smile and be a villain’. But what is the significance of this fact when it comes to assessing moral status? To me, the significance is that it means we should be very careful when assessing the behavioural evidence that is used to support a claim about moral status. We shouldn’t extrapolate too quickly from one behaviour. If a robot looks and acts like it is in pain (say) that might provide some warrant for thinking it has moral status, but we should examine its behavioural repertoire in more detail. It might emerge that other behaviours are inconsistent with the hypothesis that it feels pain or suffering.

The point here, however, is that we are always using other behavioural evidence to determine whether the initial behavioural evidence was deceptive or misleading. We are not relying on some other kind of information. Thus, for example, I think it would be a mistake to conclude that a robot cannot feel pain, even though it performs as if it does, because the manufacturer of the robot tells us that it was programmed to do this, or because some computer engineer can point to some lines of code that are responsible for the pain performance. That evidence by itself — in the absence of other countervailing behavioural evidence — cannot undermine the behavioural evidence suggesting that the robot does feel pain. Think about it like this: imagine if a biologist came to you and told you that evolution had programmed the pain response into humans in order to elicit sympathy from fellow humans. What’s more, imagine if a neuroscientist came to you and and told you she could point to the exact circuit in the brain that is responsible for the human pain performance (and maybe even intervene in and disrupt it). What they say may well be true, but it wouldn’t mean that the behavioural evidence suggesting that your fellow humans are in pain can be ignored.

This last point is really the crucial bit. This is what is most distinctive about the perspective of ethical behaviourism. The tendency to misunderstand it, ignore it, or skirt around it, is why I think many people on the ‘anti’ side of the debate make bad arguments.


5. Implications and Conclusions
That’s all I will say in defence of ethical behaviourism this evening. Let me conclude by addressing some of its implications and heading off some potential misunderstandings.

First, let me re-emphasise that ethical behaviourism is about the principles we should apply when assessing the moral standing of robots. In defending it, I am not claiming that robots currently have moral standing or, indeed, that they will ever have moral standing. I think this is possible, indeed probable, but I could be wrong. The devil is going to be in the detail of the behavioural tests we apply (just as it is with the Turing test for intelligence).

Second, there is nothing in ethical behaviourism that suggests that we ought to create robots that cross the performative threshold to moral standing. It could be, as people like Bryson and Van Wysnberghe argue, that this is a very bad idea: that it will be too disruptive of existing moral and legal norms. What ethical behaviourism does suggest, however, is that there is an ethical weight to the decision to create human-like and animal-like robots that may be underappreciated by robot manufacturers.

Third, acknowledging the potential risks, there are also potential benefits to creating robots that cross the performative threshold. Ethical behaviourism can help to reveal a value to relationships with robots that is otherwise hidden. If I am right, then robots can be genuine objects of moral affection, friendship and love, under the right conditions. In other words, just as there are ethical risks to creating human-like and animal-like robots, there are also ethical rewards and these tend to be ignored, ridiculed or sidelined in the current debate.

Fourth, and related to this previous point, the performative threshold that robots have to cross in order to unlock the different kinds of value might vary quite a bit. The performative threshold needed to attain basic moral standing might be quite low; the performative threshold needed to say that a robot can be a friend or a partner might be substantially higher. A robot might have to do relatively little to convince us that it should be treated with moral consideration, but it might have to do a lot to convince us that it is our friend.

These are topics that I have explored in greater detail in some of my papers, but they are also topics that Sven has explored at considerable length. Indeed, several chapters of his forthcoming book are dedicated to them. So, on that note, it is probably time for me to shut up and hand over to him and see what he has to say about all of this.



Reflections and Follow Ups 

After I delivered the above lecture, my colleague and friend Sven Nyholm gave a response and there were some questions and challenges from the audience. I cannot remember every question that was raised, but I thought I would respond to a few that I can remember.


1. The Randomisation Counterexample
One audience member (it was Nathan Wildman) presented an interesting counterexample to my claim that other kinds of evidence don’t defeat or undermine the behavioural evidence for moral status. He argued that we could cook-up a possible scenario in which our knowledge of the origins of certain behaviours did cause us to question whether it was sufficient for moral status.

He gave the example of a chatbot that was programmed using a randomisation technique. The chatbot would generate text at random (perhaps based on some source dataset). Most of the time the text is gobbledygook but on maybe one occasion it just happens to have a perfectly intelligible conversation with you. In other words, whatever is churned out by the randomisation algorithm happens to perfectly coincide with what would be intelligible in that context (like picking up a meaningful book in Borges’s Library of Babel). This might initially cause you to think it has some significant moral status, but if the computer programmer came along and told you about the randomisation process underlying the programming you would surely change your opinion. So, on this occasion, it looks like information about the causal origins of the behaviour, makes a difference to moral status.

Response: This is a clever counterexample but I think it overlooks two critical points. First, it overlooks the point I make about avoiding hasty anthropomorphisation towards the end of my lecture. I think we shouldn’t extrapolate too much from just one interaction with a robot. We should conduct a more thorough investigation of the robot’s (or in this case the chatbot’s) behaviours. If the intelligible conversation was just a one-off, then we will quickly be disabused of our belief that it has moral status. But if it turns out that the intelligible conversation was not a one-off, then I don’t think the evidence regarding the randomisation process would have any such effect. The computer programmer could shout and scream as much as he/she likes about the randomisation algorithm, but I don’t think this would suffice to undermine the consistent behavioural evidence. This links to a second, and perhaps deeper metaphysical point I would like to make: we don’t really know what the true material instantiation of the mind is (if it is indeed material). We think the brain and its functional activity is pretty important, but we will probably never have a fully satisfactory theory of the relationship between matter and mind hard problem. Given this, it doesn’t seem wise or appropriate to discount the moral status of this hypothetical robot that is built on a randomisation algorithm. Indeed, if such a robot existed, it might give us reason to think that randomisation was one of the ways in which a mind could be functionally instantiated in the real world.

I should say that this response ignores the role of moral precaution in assessing moral standing. If you add a principle of moral precaution to the mix, then it may be wrong to favour a more thorough behavioural test. This is something I discuss a bit in my article on ethical behaviourism.


2. The Argument confuses how we know X is valuable with what makes X actually valuable
One point that Sven stressed in his response, and which he makes elsewhere too, is that my argument elides or confuses two separate things: (i) how we know whether something is of value and (ii) what it is that makes it valuable. Another way of putting it: I provide a decision-procedure for deciding who or what has moral status but I don’t thereby specify what it is that makes them have moral status. It could be that the capacity to feel pain is what makes someone have moral standing and that we know someone feels pain through their behaviour, but this doesn’t mean that they have moral standing because of their behaviour.

Response: This is probably a fair point. I may on occasion elide these two things. But my feeling is that this is a ‘feature’ rather than a ‘bug’ in my account. I’m concerned with how we practically assess and apply principles of moral standing in the real world, and not so much with what it is that metaphysically undergirds moral standing.


3. Proxies for Behaviour versus Proxies for Mind
Another comment (and I apologise for not remembering who gave it) is that on my theory behaviour is important but only because it is a proxy for something else, namely some set of mental states or capacities. This is similar to the point Sven is making in his criticism. If that’s right, then I am wrong to assume that behaviour is the only (or indeed the most important) proxy for mental states. Other kinds of evidence serve as proxies for mental states. The example was given of legal trials where the prosecution is trying to prove what the mental status of the defendant was at the time of an offence. They don’t just rely on behavioural evidence. They also rely on other kinds of forensic evidence to establish this.

Response: I don’t think this is true and this gets to a deep feature of my theory. To take the criminal trial example, I don’t think it is true to say that we use other kinds of evidence as proxies for mental states. I think we use them as proxies for behaviour which we then use as proxies for mental states. In other words, the actual order of inference goes:


  • Other evidence → behaviour → mental state


And not:


  • Other evidence → mental state


This is the point I was getting at in my talk when I spoke about how we make inferences from functional brain activity to mental state. I believe what happens when we draw a link between brain activity and mental state, what we are really doing is this:


  • Brain state → behaviour → mental state


And not


  • Brain state → mental state.


Now, it is, of course, true to say that sometimes scientists think we can make this second kind of inference. For example, purveyors of brain based lie detection tests (and, indeed, other kinds of lie detection test) try to draw a direct line of inference from a brain state to a mental state, but I would argue that this is only because they have previously verified their testing protocol by following the “brain state → behaviour → mental state” route and confirming that it is reliable across multiple tests. This gives them the confidence to drop the middle step on some occasions, but ultimately this is all warranted (if it is, in fact, warranted – brain-based lie detection is controversial) because the scientists first took the behavioural step. To undermine my view, you would have to show that it is possible to cut out the behavioural step in this inference pattern. I don’t think this can be done, but perhaps I can be proved wrong.

This is perhaps the most metaphysical aspect of my view.


4. Default Settings and Practicalities
Another point that came up in conversation with Sven, Merel Noorman and Silvia de Conca, had to do with the default assumptions we are likely to have when dealing with robots and how this impacts on the practicalities of robots being accepting into the moral circle. In other words, even if I am right in some abstract, philosophical sense, will anyone actually follow the behavioural test I advocate? Won’t there be a lot of resistance to it in reality?

Now, as I mentioned in my lecture, I am not an activist for robot rights or anything of the sort. I am interested in the general principles we should apply when settling questions of moral status; not with whether a particular being, such as a robot, has acquired moral status. That said, implicit views about the practicalities of applying the ethical behaviourist test may play an important role in some of the arguments I am making.

One example of this has to do with the ‘default’ assumption we have when interpreting the behaviour of humans/animals vis-à-vis robots. We tend to approach humans and animals with an attitude of good faith, i.e. we assume their each of their outward behaviours is a sincere representation of their inner state of mind. It’s only if we receive contrary evidence that we will start to doubt the sincerity of the behaviour.

But what default assumption do we have when confronting robots? It seems plausible to suggest that most people will approach them with an attitude of bad faith. They will assume that their behaviours are representative of nothing at all and will need a lot of evidence to convince them that they should be granted some weight. This suggests that (a) not all behavioural evidence is counted equally and (b) it might be very difficult, in practice, for robots to be accepted into the moral circle.


Response: I don’t see this as a criticism of ethical behaviourism but, rather, a warning to anyone who wishes to promote it. In other words, I accept that people will resist ethical behaviourism and may treat robots with greater suspicion than human or animal agents. One of the key points of this lecture and the longer academic article I wrote about the topic was to address this suspicion and skepticism. Nevertheless, the fact that there may be these practical difficulties does not mean that ethical behaviourism is incorrect. In this respect, it is worth noting that Turing was acutely aware of this problem when he originally formulated his 'Imitation Game' test. The reason why the test was purely text-based in its original form was to prevent human-centric biases affecting its operation.






Thursday, October 10, 2019

Escaping Skinner's Box: AI and the New Era of Techno-Superstition




[The following is the text of a talk I delivered at the World Summit AI on the 10th October 2019. The talk is essentially a nugget taken from my new book Automation and Utopia. It's not an excerpt per se, but does look at one of the key arguments I make in the book]

The science fiction author Arthur C. Clarke once formulated three “laws” for thinking about the future. The third law states that “any sufficiently advanced technology is indistinguishable from magic”. The idea, I take it, is that if someone from the Paleolithic was transported to the modern world, they would be amazed by what we have achieved. Supercomputers in our pockets; machines to fly us from one side of the planet to another in less than a day; vaccines and antibiotics to cure diseases that used to kill most people in childhood. To them, these would be truly magical times.

It’s ironic then that many people alive today don’t see it that way. They see a world of materialism and reductionism. They think we have too much knowledge and control — that through technology and science we have made the world a less magical place. Well, I am here to reassure these people. One of the things AI will do is re-enchant the world and kickstart a new era of techno-superstition. If not for everyone, then at least for most people who have to work with AI on a daily basis. The catch, however, is that this is not necessarily a good thing. In fact, it is something we should worry about.

Let me explain by way of an analogy. In the late 1940s, the behaviorist psychologist BF Skinner — famous for his experiments on animal learning —got a bunch of pigeons and put them into separate boxes. Now, if you know anything about Skinner you’ll know he had a penchant for this kind of thing. He seems to have spent his adult life torturing pigeons in boxes. Each box had a window through which a food reward would be presented to the bird. Inside the box were different switches that the pigeons could press with their beaks. Ordinarily, Skinner would set up experiments like this in such a way that pressing a particular sequence of switches would trigger the release of the food. But for this particular experiment he decided to do something different. He decided to present the food at random intervals, completely unrelated to the pressing of the switches. He wanted to see what the pigeons would do as a result.

The findings were remarkable. Instead of sitting idly by and waiting patiently for their food to arrive, the pigeons took matters into their own hands. They flapped their wings repeatedly, they danced around in circles, they hopped on one foot, convinced that their actions had something to do with the presentation of the food reward. Skinner and his colleagues likened what the pigeons were doing to the ‘rain dances’ performed by various tribes around the world: they were engaging in superstitious behaviours to control an unpredictable and chaotic environment.

It’s important that we think about this situation from the pigeon’s perspective. Inside the Skinner box, they find themselves in an unfamiliar world that is deeply opaque to them. Their usual foraging tactics and strategies don’t work. Things happen to them, food gets presented, but they don’t really understand why. They cannot cope with the uncertainty; their brains rush to fill the gap and create the illusion of control.

Now what I want to argue here is that modern workers, and indeed all of us, in an environment suffused with AI, can end up sharing the predicament of Skinner’s pigeons. We can end up working inside boxes, fed information and stimuli by artificial intelligence. And inside these boxes, stuff can happen to us, work can get done, but we are not quite sure if or how our actions make a difference. We end up resorting to odd superstitions and rituals to make sense of it all and give ourselves the illusion of control, and one of the things I worry about, in particular, is that a lot of the current drive for transparent or explainable AI will reinforce this phenomenon.



This might sound far-fetched, but it’s not. There has been a lot of talk in recent years about the ‘black box’ nature of many AI-systems. For example, the machine learning systems used to support risk assessments in bureaucratic, legal and financial settings. These systems all work in the same way. Data from human behaviour gets fed into them, and they then spit out risk scores and recommendations to human decision-makers. The exact rationale for those risk scores — i.e. the logic the systems use — is often hidden from view. Sometimes this is for reasons intrinsic to the coding of the algorithm; other times it is because it is deliberately concealed or people just lack the time, inclination or capacity to decode the system.

The metaphor of the black box, useful though it is, is, however, misleading in one crucial respect: It assumes that the AI is inside the box and we are the ones trying to look in from the outside. But increasingly this is not the case. Increasingly, it is we who are trapped inside the box, being sent signals and nudges by the AI, and not entirely sure what is happening outside.



Consider the way credit-scoring algorithms work. Many times neither the decision-maker (the human in the loop) nor the person affected knows why they get the score they do. The systems are difficult to decode and often deliberately concealed to prevent gaming. Nevertheless, the impact of these systems on human behaviour is profound. The algorithm constructs a game in which humans have to act within the parameters set by the algorithm to get a good score. There are many websites dedicated to helping people reverse engineer these systems, often giving dubious advice about behaviours and rituals you must follow to improve your scores. If you follow this advice, it is not too much of a stretch to say that you end up like one Skinner’s pigeons - flapping your wings to maintain some illusion of control.

Some of you might say that this is an overstatement. The opaque nature of AI is a well-known problem and there are now a variety of technical proposals out there for making it less opaque and more “explainable” [some of which have been discussed here today]. These technical proposals have been accompanied by increased legal safeguards that mandate greater transparency. But we have to ask ourselves a question: will these solutions really work? Will they help ordinary people to see outside the box and retain some meaningful control and understanding of what is happening to them?

A recent experiment by Ben Green and Yiling Chen from Harvard tried to answer these questions. It looked at how human decision-makers interact with risk assessment algorithms in criminal justice and finance (specifically in making decisions about pretrial release of defendants and the approval loan applications). Green and Chen created their own risk assessment systems, based on some of the leading commercially available models. They then got a group of experimental subjects (recruited via Amazon’s Mechanical Turk) to use these algorithms to make decisions under a number of different conditions. I won’t go through all the conditions here, but I will describe the four most important. In the first condition, the experimental subjects were just given the raw score provided by the algorithm and asked to make a decision on foot of this; in the second they were asked to give their own prediction initially and then update it after being given the algorithm’s prediction; in the third they were given the algorithm’s score, along with an explanation of how that score was derived, and asked to make a choice; and in the fourth they were given the opportunity to learn how accurate the algorithm was based on real world results (did someone default on their loan or not; did they show up to their trial or not). The question was: how would the humans react to these different scenarios? Would giving them more information improve the accuracy, reliability and fairness of their decision-making?

The findings were dispiriting. Green and Chen found that using algorithms did improve the overall accuracy of decision-making across all conditions, but this was not because adding information and explanations enabled the humans to play a more meaningful role in the process. On the contrary, adding more information often made the human interaction with the algorithm worse. When given the opportunity to learn from the real-world outcomes, the humans became overconfident in their own judgments, more biased, and less accurate overall. When given explanations, they could maintain accuracy but only to the extent that they deferred more to the algorithm. In short, the more transparent the system seemed to the worker, the more the workers made them worse or limited their own agency.

It is important not to extrapolate too much from one study, but the findings here are consistent what has been found in other cases of automation in the workplace: humans are often the weak link in the chain. They need to be kept in check. This suggests that if we want to reap the benefits of AI and automation, we may have to create an environment that is much like that of the Skinner box, one in which humans can flap their wings, convinced they are making a difference, but prevented from doing any real damage. This is the enchanted world of techno-superstition: a world in which we adopt odd rituals and habits (explainable AI; fair AI etc) to create an illusion of control.



Now, the original title of my talk promised five reasons for pessimism about AI in the workplace. But what we have here is one big reason that breaks down into five sub-reasons. Let me explain what I mean. The problem of techno-superstitionism stems from two related problems: (i) a lack of understanding/knowledge of how the world (in this case the AI system) works and (ii) the illusion of control over that system.

These two problems combine into a third problem: the erosion of the possibility of achievement. One reason why we work is so that we can achieve certain outcomes. But when we lack understanding and control it undermines our sense of achievement. We achieve things when we use our reason to overcome obstacles to problem-solving in the real world. Some people might argue that a human collaborating with an AI system to produce some change in the world is achieving something through the combination of their efforts. But this is only true if the human plays some significant role in the collaboration. If humans cannot meaningfully make a difference to the success of AI or accurately calibrate their behaviour to produce better outcomes in tandem with the AI, then the pathway to achievement is blocked. This seems to be what happens, even when we try to make the systems more transparent.

Related to this is the fourth problem: that in order to make AI systems work effectively with humans, the designers and manufacturers have to control human attention and behaviour in a way that undermines human autonomy. Humans cannot be given free rein inside the box. They have to be guided, nudged, manipulated and possibly even coerced, to do the right thing. Explanations have to be packaged in a way that prevents the humans from undermining the accuracy, reliability and fairness of the overall system. This, of course, is not unusual. Workplaces are always designed with a view to controlling and incentivising behaviour, but AI enables a rapidly updating and highly dynamic form of behavioural control. The traditional human forms of resistance to outside control cannot easily cope with this new reality.

This all then culminates in the fifth and final problem: the pervasive use of AI in the workplace (and society more generally) (v) undermines human agency. Instead of being the active captains of our fates; we become the passive recipients of technological benefits. This is a tragedy because we have built so much of our civilisation and sense of self-worth on the celebrations of agency. We are supposed to be agents of change, responsible to ourselves and to one another for what happens in the world around us. This is why we value the work we do and why we crave the illusion of control. What happens if agency can no longer be sustained?

As per usual, I have left the solutions to the very end — to the point in the talk where they cannot be fully fleshed out and where I cannot be reasonably criticised for failing to do so — but it seems to me that we face two fundamental choices when it comes to addressing techno-superstition: (i) we can tinker with what’s presented to us inside the box, i.e. we can add more bells and whistles to our algorithms, more levers and switches. These will either give humans genuine understanding and control over the systems or the illusion of understanding and control. The problem with the former is that frequently involves tradeoffs or compromises to the system’s efficacy and the problem with the latter is that involves greater insults to the agency of the humans working inside the box. But there is an alternative: we can stop flapping our wings and get out of the box altogether. Leave the machines to do what they are best at while we do something else. Increasingly, I have come to think we should do the latter; that do so would acknowledge the truly liberating power of AI. This is the argument I develop further in my book Automation and Utopia.

Thank you for your attention.



Tuesday, September 24, 2019

Automation and Utopia is Now Available!




[Amazon.com] [Amazon.co.uk] [Book Depository] [Harvard UP] [Indiebound] [Google Play]

"Armed with an astonishing breadth of knowledge, John Danaher engages with pressing public policy issues in order to lay out a fearless exposition of the radical opportunities that technology will soon enable. With the precision of analytical philosophy and accessible, confident prose, Automation and Utopia demonstrates yet again why Danaher is one of our most important pathfinders to a flourishing future.”  
James Hughes, Institute for Ethics and Emerging Technologies

After 10 years, over 1000 blog posts, 50+ academic papers, and 60+ podcasts, I have finally published my first solo-authored book Automation and Utopia: Human Flourishing in a World Without Work (Harvard University Press 2019). I'm excited to finally share it with you all.

The book tries to present a rigorous case for techno-utopianism and a post-work future. I wrote it partly as a result of my own frustration with techno-futurist non-fiction. I like books that present provocative ideas about the future, but I often feel underwhelmed by the strength of the arguments they use to support these ideas. I don't know if you are like me, but if you are then you don't just want to be told what someone thinks about the future; you want to be shown why (and how) they think about the future and be able to critically assess their reasoning. If I got it right, then Automation and Utopia will allow you to do this. You may not agree with what I have to say in the end, but you should at least be able to figure out where I have gone wrong.

The book defends four propositions:


  • Proposition 1 - The automation of work is both possible and desirable: work is bad for most people most of the time, in ways that they don’t always appreciate. We should do what we can to hasten the obsolescence of humans in the arena of work.

  • Proposition 2 - The automation of life more generally poses a threat to human well-being, meaning, and flourishing: automating technologies undermine human achievement, distract us, manipulate us and make the world more opaque. We need to carefully manage our relationship with technology to limit those threats.

  • Proposition 3 - One way to mitigate this threat would be to build a Cyborg Utopia, but it’s not clear how practical or utopian this would really be: integrating ourselves with technology, so that we become cyborgs, might regress the march toward human obsolescence outside of work but will also carry practical and ethical risks that make it less desirable than it first appears.

  • Proposition 4 - Another way to mitigate this threat would be to build a Virtual Utopia: instead of integrating ourselves with machines in an effort to maintain our relevance in the “real” world, we could retreat to “virtual” worlds that are created and sustained by the technological infrastructure that we have built. At first glance, this seems tantamount to giving up, but there are compelling philosophical and practical reasons for favouring this approach.


If you have ever enjoyed anything I've written, and if you have any interest in technology, the future of work, human flourishing, utopianism, virtual reality, cyborgs, transhumanism, autonomy, anti-work philosophy, economics, philosophy, techno-optimism and, indeed, techno-pessimism, please consider getting a copy.

If you want to whet your appetite for the contents of the book, please check out my earlier blog series on technological unemployment and the value of work. Below is a short trailer with additional context and information.






Saturday, September 21, 2019

Should we create artificial moral agents? A Critical Analysis




I recently encountered an interesting argument. It was given in the midst of one of those never-ending Twitter debates about the ethics of AI and robotics. I won’t say who made the argument (to be honest, I can’t remember) but the gist of it was that we shouldn’t create robots with ethical decision-making capacity. I found this intriguing because, on the face of it, it sounds like a near-impossible demand. My intuitive reaction was that any robot embedded in a social context, with a minimal degree of autonomous agency, would have to have some ethical decision-making capacity.

Twitter is not the best forum for debating these ideas. Neither the original argument nor my intuitive reaction to it was worked out in any great detail. But it got me thinking. I knew there was a growing literature on both the possibility and desirability of creating ethical robots (or ‘artificial moral agents’ - AMAs - as some people call them). So I decided to read around a bit. My reading eventually led me to an article by Amanda Sharkey called ‘Can we program or train robots to be good?’, which provided the inspiration for the remainder of what you are about to read.

Let me start by saying that this is a good article. In it, Sharkey presents an informative and detailed review of the existing literature on AMAs. If you want to get up to speed on the current thinking, I highly recommend it. But it doesn’t end there. Sharkey also defends her own views about the possibility and desirability of creating an AMA. In short, she argues that it is probably not possible and definitely not desirable. One of the chief virtues of Sharkey’s argumentative approach is that it focuses on existing work in robotics and not so much on speculative future technologies.

In what follows I want to critically analyse Sharkey’s main claims. I do so because, although I agree with some of what she has to say, I find that I am still fond of my intuitive reaction to the Twitter argument. As an exercise in self-education, I want to try to explain why.


1. What is an ethical robot?
A lot of the dispute about the possibility and desirability of creating an ethical robot hinges on what we think such a robot would look like (in the metaphorical sense of ‘look’). A robot can be defined, loosely, as any embodied artificial agent. This means that a robot is an artifact with some degree of actuating power (e.g. a mechanical arm) that it can use to change its environment in order to achieve a goal state. In doing this, it has some capacity to categorise and respond to environmental stimuli.

On my understanding, all robots also have some degree of autonomous decision-making capacity. What I mean is that they do not require direct human supervision and control in order to exercise all of their actuating power. In other words, they are not just remote controlled devices. They have some internal capacity to selectively sort environmental stimuli in order to determine whether or not a decision needs to be made. Nevertheless, the degree of autonomy can be quite minimal. Some robots can sort environmental stimuli into many different categories and can make many different decisions as a result, some can only sort stimuli into one or two categories and make only one type of decision.

What would make a robot, so defined, an ethical decision-maker? Sharkey reviews some of the work that has been done to date on this question, including in particular the work of Moor (2007), Wallach and Allen (2009) and Malle (2016). I think there is something to be learned from each of these authors, but since I don’t agree entirely with any of them, what I offer here is my own modification of their frameworks.

First, let me offer a minimal definition of what an ethical robot is: it is a robot that is capable of categorising and responding to ethically relevant variables in its environment with a view towards making decisions that humans would classify as ‘good’, ‘bad’, ‘permissible’, ‘forbidden’ etc. Second, following James Moor, let me draw a distinction between two kinds of ethical agency that such a robot could exhibit:

Implicit Ethical Agency: The agent identifies and acts upon ethically relevant variables (principles, norms, values etc) without explicitly representing, using or reporting on those variables, or without explicitly using ethical language to explain and justify its actions (Moor’s definition of this stipulates that an implied ethical agent has ethical considerations designed into its decision-making mechanisms).

Explicit Ethical Agency: The agent identifies and acts upon ethically relevant variables (principles, norms, values etc) and does explicitly represent, use and report on those variables, and may use ethical language to explain and justify its actions.

You can think of these two forms of ethical agency as defining a spectrum along which we can classify different ethical agents. At one extreme we have a simple implicit ethical agent that acts upon ethically relevant considerations but never explicitly relies upon those considerations in how it models, reports or justifies its choices. At the other extreme you have a sophisticated explicit ethical agent, who knows all about the different ethical variables affecting their choices and explicitly uses them to model, report and justify its choices.

Degrees of autonomy are also relevant to how we categorise ethical agents. The more autonomous an ethical agent is the more ethically relevant variables it will be able to recognise and act upon. So, for example, a simple implicit ethical agent, with low degrees of autonomy, may be able to act upon one or two ethically relevant considerations. For example, it may be able to sort stimuli into two categories — ‘harmful’ and ‘not harmful’ — and make one of two decisions in response — ‘approach’ or ‘avoid’. An implicit ethical agent with high degrees of autonomy would be to sort stimuli into many more categories: ‘painful’, ‘pleasurable’, ‘joyous’, ‘healthy’, ‘laughter-inducing’ and so on; and would also be able to make many more decisions.

The difference between an explicit ethical agent with low degrees of autonomy and one with high degrees of autonomy would be something similar. The crucial distinction between an implicit ethical agent and an explicit ethical agent is that the latter would explicitly rely upon the ethical concepts and principles to categorise, classify and sort between stimuli and decisions. The former would not and would only appear to us (or be intended by us) to be reacting to them. So, for example, an implicit ethical agent may appear to us (and be designed by us) to sort stimuli into categories like ‘harmful’ and ‘not harmful’, but it may do this by reacting to how hot or cold a stimulus is.

This probably seems very abstract so let’s make it more concrete. An example of a simple implicit ethical agent (used by Moor in his discussion of ethical agency) would be an ATM. An ATM has a very minimal degree of autonomy. It can sort and categorise one kind of environmental stimulus (buttons pressed on a numerical key pad) and make a handful of decisions in response to these categorisations: give user the option to withdraw money or see account balance (etc); dispense money/do not dispense money. In doing so, it displays some implicit ethical agency insofar as its choices imply judgments about property ownership and the distribution of money. An example of a sophisticated explicit ethical agent would be an adult human being. A morally normal adult can categorise environmental stimuli according to many different ethical principles and theories and make decisions accordingly.

In short, then what we have here is a minimal definition of ethical agency and a framework for classifying different degrees of ethical agency along two axes: the implied-explicit axis; and the autonomy axis. The figure below illustrates the idea.




You might find this distinction between implied and explicit ethical agency odd. You might say: “surely the only meaningful kind of ethical agency is explicit? That’s what we look for in morally healthy adults. Classifying implied ethical agents as ethical agents is both unnecessary and over-inclusive.” But I think that’s wrong. It it is worth bearing in mind that a lot of the ethical decisions made by adult humans are examples of implied ethical agency. Most of the time, we do not explicitly represent and act upon ethical principles and values. Indeed, if moral psychologists like Jonathan Haidt are correct, the explicit ethical agency that we prize so highly is, in fact, an epiphenomenon: a post-hoc rationalisation of our implied ethical agency. I’ll return to this idea later on.

Another issue that is worth addressing before moving on is the relationship between ethical agency and moral/legal responsibility. People often assume that agency goes hand-in-hand with responsibility. Indeed, according to some philosophical accounts, a moral agent must, by necessity, be a morally responsible agent. But, it should be clear from the foregoing, that ethical agency does not necessarily entail responsibility. Simple implied ethical agency, for instance, clearly does not entail responsibility. A simple implied ethical agent would not have the capacity for volition and understanding, both of which we expect of a responsible agent. Sophisticated explicit ethical agents are another matter. They probably are responsible agents, though they may have excuses for particular actions.

This distinction between agency and responsibility is important. It turns out that much of the opposition to creating an ethical robot stems from the perceived link between agency and responsibility. If you don't accept that link, much of the opposition to the idea of creating an artificial moral agent ebbs away.


2. Methods for Creating an Ethical Robot
Now that we are a bit clearer about what an ethical robot might look like, we can turn to the question of how to create them. As should be obvious, most of the action here has to do with how we might go about creating the sophisticated explicit ethical agents. After all, creating simple implied ethical agents is trivial: it just requires creating a robot with some capacity to sort and respond to stimuli along lines that we would call ethical. Sophisticated explicit ethical agents pose a more formidable engineering challenge.

Wallach and Allen (2009) argue that there are two ways of going about this:

Top-down method: You explicitly use an ethical theory to program and design the robot, e.g. hard-coding into the robot an ethical principle such as ‘do no harm’.

Bottom-up method: You create an environment in which the robot can explore different courses of action and be praised or criticised (rewarded/punished) for its choices in accordance with ethical theories. In this way, the robot might be expected to develop its own ethical sensitivity (like a child that acquires a moral sense over the course of its development).

As Sharkey notes, much of the work done to date on creating a sophisticated AMA has tended to be theoretical or conceptual in nature. Still, there are some intriguing practical demonstrations of the idea. Three stood out from her discussion:

Winfield et al 2014: Created a robot that was programmed to stop other robots (designated as proxy humans in the experiment) from entering a ‘hole’/dangerous area. The robot could assess the consequences of trajectories through the experimental environment in terms of the degree of risk/harm they posed to the ‘humans’ and would then have to make a choice as to what to do to mitigate the risk (including blocking the ‘humans’ or, even, sacrificing itself). Sometimes the robot was placed in a dilemma situation where it had to choose between one of two ‘humans’ to save. Winfield et al saw this as a minimal attempt to implement Asimov’s first law of robotics. The method here is clearly top-down.
Anderson and Anderson 2007: Created a medical ethics robot that could give advice to healthcare workers about what do when a patient had made a treatment decision. Should the worker accept the decision or try to get the patient to change its mind? Using the ‘principlism’ theory in medical ethics, the robot was trained on a set of case studies (classified as involving ethically correct decisions) and then used inductive logic programming to understand how the ethical principles work in these cases. It could then abstract new principles from these cases. The Andersons claimed that their robot induced a new ethical principle from this process. Initially, it might sound like this involves the bottom-up method but Sharkey classifies it as top-down because a specific ethical theory (namely: principlism) was used when programming and training the robot. The Andersons did similar experiments subsequent to this original one along the same lines.
Riedl and Harrison 2015: Reported an initial attempt to use machine learning to train an AI to align its values with those of humans by learning from stories. The idea was that the stories contained information about human moral norms and the AI could learn human morality from them. A model of ‘legal’ (or permissible) plot transitions was developed from the stories, and the AI was then rewarded or punished in an experimental environment, depending on whether it made a legal transition or not. This was preliminary study only but would be an example of the bottom-up method at work.

I am sure there are other studies out there that would be worth considering. If anyone knows of good/important ones please let me know in the comments. But assuming these studies are broadly representative of the kind of work that has been done to date, one thing becomes immediately clear: we are a long way from creating a sophisticated explicit ethical agent. Will we ever get there?


3. Is it possible to create an ethical robot?
One thing Sharkey says about this — which I tend to agree with — is that much of the debate about the possibility of creating a sophisticated explicit ethical robot seems to come down to different groups espousing different faith positions. Since we haven’t created one yet, we are forced to speculate about future possibilities and a lot of that speculation is not easy to assess. Some people feel strongly that it is possible to create such a robot; others feel strongly that it is not. These arguments are influenced, in turn, by how desirable this possibility is seen to be.

With this caveat in mind, Sharkey still offers her own argument for thinking that it is not possible to create an explicit ethical agent. I’ll quote some of the key passages from her presentation of this argument in full. After that, I’ll try to make sense of them. She starts with this:

One reason for being skeptical about the likelihood that non-living, non-biological machines could develop a sense of morality at some point in the future is their lack of a biological substrate. A case can be made for the grounding of morality in biology. 
(Sharkey 2017, p 8)

She then discusses the work of Patricia Churchland, which argues that the social emotions are key to human morality and that these emotions have a clear evolutionary history and bio-mechanical underpinning. This leads Sharkey to argue that:

Current robots, lacking living bodies, cannot feel pain, or even care about themselves, let alone extend that concern to others. How can they empathise with a human’s pain and distress if they are unable to experience either emotion? Similarly, without the ability to experience guilt or regret, how could they reflect on the effects of their actions, modify their behavior, and build their own moral framework? 
(Sharkey 2017, p 8)

She continues by discussing the work of other authors on the important link between the emotions, morality and biology.

So what argument is being made? At first, it might look like Sharkey is arguing that moral agency depends on biology, but I think that is a bit of a red herring. What she is arguing is that moral agency depends on emotions (particularly second personal emotions such as empathy, sympathy, shame, regret, anger, resentment etc). She then adds to this the assumption that you cannot have emotions without having a biological substrate. This suggests that Sharkey is making something like the following argument:


  • (1) You cannot have explicit moral agency without having second personal emotions.

  • (2) You cannot have second personal emotions without being constituted by a living biological substrate.

  • (3) Robots cannot be constituted by a living biological substrate.

  • (4) Therefore, robots cannot have explicit moral agency.



Assuming this is a fair reconstruction of the reasoning, I have some questions about it. First, taking premises (2) and (3) as a pair, I would query whether having a biological substrate really is essential for having second personal emotions. What is the necessary connection between biology and emotionality? This smacks of biological mysterianism or dualism to me, almost a throwback to the time when biologists thought that living creatures possessed some élan vital that separated them from the inanimate world. Modern biology and biochemistry casts all that into doubt. Living creatures are — admittedly extremely complicated — evolved biochemical machines. There is no essential and unbridgeable chasm between the living and the inanimate. The lines are fuzzy and gradual. Current robots may be much less sophisticated than biological machines, but they are still machines. It then just becomes a question of which aspects of biological form underlie second personal emotions, and which can be replicated in synthetic form. It is not obvious to me that robots could never bridge the gap.

Of course, all this assumes that you accept a scientific, materialist worldview. If you think there is more to humans than matter in motion, and that this ‘something more’ is what supports our rich emotional repertoire, then you might be able to argue that robots will never share that emotional repertoire. But in that case, the appeal to biology and the importance of a biological substrate will make no sense, and you will have to defend the merits of the non-materialistic view of humans more generally.

In any event, as a said previously, I think the discussion of biology is a red herring. What Sharkey really cares about is the suite of second personal emotions and the claim is that robots will never share those emotions. This is where premise (1) becomes important. There are two questions to ask about this premise: what do you need in order to have second personal emotions? And why is it that robots can never have this?

There are different theories of emotion out there. Some people would argue that in order to have emotions you have to have phenomenal consciousness. In other words, in order to be angry you have to feel angry; in order to be empathetic you have to feel what another person is feeling. There is ‘something it is like’ to have these emotions and until robots have this something, they cannot be said to be emotional. This seems to be the kind of argument Sharkey is making. Look back to the quoted passages above. She places a lot of emphasis on the capacity to feel the pain of another, to feel guilt and regret. This suggests that Sharkey’s argument against the possibility of a robotic moral agent really boils down to an argument against the possibility of phenomenal consciousness in robots. I cannot get into that debate in this article, but suffice to say there are plenty of people who argue that robots could be phenomenally conscious, and that the gap here is, once again, not as unbridgeable as is supposed. Indeed, there are people, such Roman Yampolskiy, who argue that robots may already be minimally phenomenally conscious and that there are ways to test for this. Many people will resist this thought, but I find Yampolskiy’s work intriguing because it cuts through a lot of the irresolvable philosophical conundrums about consciousness and tries to provide clear, operational and testable understandings of it.

There is, also, another way of understanding the emotions. Instead of being essentially phenomenal experiences they can be viewed as cognitive tools for appraising and evaluating what is happening in the world around the agent. When a gazelle sees a big, muscly predator stalking into their field of vision, their fear response is triggered. The fear is the brain’s way of telling them that the predator is a threat to their well-being and that they may need to run. In other words, the emotion of fear is a way of evaluating the stimulus that the gazelle sees and using this evaluating to guide behaviour. The same is true for all other emotions. They are just the brain’s way of assigning different weights and values to environmental stimuli and then filtering this forward into its decision-making processes. There is nothing in this account that necessitates feelings or experiences. The whole process could take place sub-consciously.

If we accept this cognitive theory of emotions, it becomes much less obvious why robots cannot have emotions. Indeed, it seems to me that if robots are to be autonomous agents at all, then they will have to have some emotions: they have to have some way of assigning weights and values to environmental stimuli. This is essential if they are going to make decisions that help them achieve their goal states. I don’t see any reason why these evaluations could not fall into the categories we typically associated with second personal emotions. This doesn’t mean that robots will feel the same things we feel when we have those emotions. After all, we don’t know if other humans feel the same things we feel. But robots could in principle act as if they share our emotional world and, as I have argued before, that acting ‘as if’ is enough.

Before I move on, I want to emphasise that this argument is about what is possible with robots, not what is actually the case. I’m pretty confident that present day robots do not share our emotional world and so do not rise to the level of sophisticated, explicit moral agents. My view would be similar to that of Yampolskiy’s with respect to phenomenal consciousness: present day robots probably have a minimal, limited form of cognitive emotionality and roboticists can build upon this foundation.


4. Should we create an ethical robot?
Even if it were possible, should we want to create robots with sophisticated ethical agency? Some people think we should. They argue that if we want robots to become more socially useful and integrated into our lives, then they will have to have improved moral agency. Human social life depends on moral agency and robots will not become integrated into human social life without it. Furthermore, there are some use cases — medical care, military, autonomous vehicles — where some form of ethical agency would seem to be a prerequisite for robots. In addition to this, people argue that we can refine and improve our understanding of morality by creating a robotic moral agent: the process will force us to clarify moral concepts and principles, and remove inconsistencies in our ethical thinking.

Sharkey is more doubtful. It is hard to decipher her exact argument, but she seems to make three key points. First, as a preliminary point, she agrees with other authors that it is dangerous to prematurely apply the language of ethical agency to robots because this tends to obscure human responsibility for the actions of robots:

Describing such machines as being moral, ethical, or human, risks increasing the tendency for humans to fail to acknowledge their ultimate responsibility for the actions of these artefacts…an important component to undertaking a responsible approach to the deployment of robots in sensitive areas is to avoid the careless application of words and terms used to describe human behaviour and decision-making. 
(Sharkey 2017, 9)

As I say, this is a preliminary point. It doesn’t really speak to the long-term desirability of creating robots with ethical agency, but it does suggest that it is dangerous to speak of this possibility prematurely, which is something that might be encouraged if we are trying to create such a robot. This highlights the point I made earlier about the link between concerns about ‘responsibility gaps’ and concerns about ethical agency in robots.

Sharkey then makes two more substantive arguments against the long-term desirability of robots with ethical agency. First, she argues that the scenarios in which we need competent ethical agents are ones in which “there is some ambiguity and a need for contextual understanding: situations in which judgment is required and there is not a single correct answer” (Sharkey 2017, 10). This implies that if we were to create robots with sophisticated ethical agency it would be with a view to deploying them in scenarios involving moral ambiguity. Second, she argues that we should not want robots to be dealing with these scenarios given that they currently lack the capacity to understand complex social situations and given that they are unlikely to acquire that capacity. She then continues by arguing that this rules robots out of a large number of social roles/tasks. The most obvious of these would be military robots or any other robots involved in making decisions about killing people, but it would also include other social-facing robots such a teaching robots, care robots and even bar-tending robots:

But how could a robot make appropriate decisions about when to praise a child, or when to restrict his or her activities, without a moral understanding? Similarly how could a robot provide good care for an older person without an understanding of their needs, and of the effects of its actions? Even a bar-tending robot might be placed in a situation in which decisions have to be made about who should or should not be served, and what is and is not acceptable behaviour. 
(Sharkey 2017, 11)

What can we make of this argument? Let me say three things by way of response.

First, if we accept Sharkey’s view then we have to accept that a lot of potential use cases for robots are off the table. In particular, we have to accept that most social robots — i.e. robots that are intended to be integrated into human social life — are ethically inappropriate. Sharkey claims that this is not the case. She claims that there would still be some ethically acceptable uses of robots in social settings. As an example, she cites an earlier paper of hers in which she argued that assistive robots for the elderly were okay, but care robots were not. But I think her argument is more extreme than she seems willing to accept. Most human social settings are suffused with elements of moral ambiguity. Even the use of an assistive robot — if it has some degree of autonomy — will have the potential to butt up against cases in which a capacity to navigate competing ethical demands might be essential. This is because human morality is replete with vague and sometimes contradictory principles. Consider her own example of the bar-tending robot. What she seems to be suggesting with this example is that you ought not to have a robot that just serves people as much alcohol as they like. Sometimes, to both protect themselves and others, people should not be served alcohol. But, of course, this is true for any kind of assistance a robot might provide to a human. People don’t always want what is morally best for themselves. Sometimes there will be a need to judge when it is appropriate to give assistance and when it is not. I cannot imagine an interaction with a human that would not, occasionally, have features like this. This implies that Sharkey’s injunction, if taken seriously, could be quite restrictive.

People may be willing to pay the price and accept those restrictions on the use of robots, but this then brings me to the second point. Sharkey’s argument hinges on the premise that we want moral agents to be sensitive to moral ambiguities and have the capacity to identify and weigh competing moral interests. The concern is that robots will be too simplistic in their moral judgments and lack the requisite moral sensitivity. But it could be that humans are too sensitive to the moral ambiguities of life and, as a consequence, too erratic and flexible with their moral judgments. For example, when making decisions about how to distribute social goods, there is a tendency to get bogged down in all the different moral variables and interests at play, and then struggle to balance those interests effectively when making decisions. When making choices about healthcare, for instance, which rules should we follow: should we give to most needy? What defines those with the most need? What is a healthcare need in the first place? Should we force people to get insurance and refuse to treat those without? Should those who are responsible for their own ill-health be pushed down the order of priority when receiving treatment? If you take all these interests seriously, you can easily end up in a state of moral paralysis. Robots, with their greater simplicity and stricter rule-following behaviour, might be beneficial because they can cut through the moral noise. This, as I understand it, is one of the main arguments in favour of autonomous vehicles: they are faster at responding to some environmental stimuli but also stricter in how they follow certain rules of the road; this can make them safer, and less erratic, than human drivers. This remains to be seen, of course. We need a lot more testing of these vehicles before we can become reasonably confident of their greater safety, but it seems to me that there is a prima facie case that warrants this testing. I suspect this is true across many other possible use cases for robots too.

Third, and finally, I want to return to my original argument — the intuition that started this article — about the unavoidability of ethical robot agents. Even if we accept Sharkey’s view that we shouldn’t create sophisticated explicit ethical agents, it seems to me that if we are going to create robots at all, we will still have to create implicit ethical agents and hence confront many of the same design choices that would go into the design of an explicit ethical agent. The reasoning flows from what was said previously. Any interaction a robot has with a human is going to be suffused with moral considerations. There is no getting away from this: these considerations constitute the invisible framework of our social lives. If a robot is going to work autonomously within that framework, then it will have to have some capacity to identify and respond to (at least some of) those considerations. This may not mean that they explicitly identify and represent ethical principles in their decision-making, but they will need to do so implicitly.* This might sound odd but then recall the point I made previously: that according to some moral psychologists this is essentially how human moral agency functions: the explicit stuff comes after our emotional and subconscious mind has already made its moral choices. You could get around this by not creating robots with autonomous decision-making capacity. But I would argue that, in that case, you are not really creating a robot at all: you are creating a remote controlled tool.

* This is true even if the robot is programmed to act in an unethical way. In that case the implicit ethical agency contradicts or ignores moral considerations. This still requires some implicit capacity to exercise moral judgment with respect to the environmental stimuli.