Pages

Thursday, July 31, 2014

Bostrom on Superintelligence (4): Malignant Failure Modes



(Series Index)

This is the fourth post of my series on Nick Bostrom’s recent book Superintelligence: Paths, Dangers, Strategies. In the previous post, I started my discussion of Bostrom’s argument for an AI doomsday scenario. Today, I continue this discussion by looking at another criticism of that argument, along with Bostrom’s response.

To set things up, we need to briefly recap the salient aspects of Bostrom’s doomsday argument. As we saw the last day, that argument consists of two steps. The first step looks at the implications that can be drawn from three theses: (i) the first mover thesis, which claims that the first superintelligence in the world could obtain a decisive advantage over all other intelligences; (ii) the orthogonality thesis, which claims that there is no necessary connection between high intelligence and benevolence; and (iii) the instrumental convergence thesis, which argues that a superintelligence, no matter what its final goals, would have an instrumental reason to pursue certain sub-goals that are inimical to human interests, specifically the goal of unrestricted resource acquisition. The second step of the argument merely adds that humans are either made of resources or reliant on resources that the first superintelligence could use in the pursuit of its final goals. This leads to the conclusion that the first superintelligence could pose a profound existential threat to human beings.

There are two obvious criticisms of this argument. The first — which we dealt with the last day — is that careful safety testing of an AI could ensure that it poses no existential threat. Bostrom rejects this on the grounds that superintelligent AIs could take “treacherous turns”. The second — which we’ll deal with below — argues that we can avoid the existential threat by simply programming the AI to pursue benevolent, non-existentially threatening goals.


1. The Careful Programming Objection
Human beings will be designing and creating advanced AIs. As a result, they will have the initial control over its goals and its decision-making procedures. Why couldn’t they simply programme the AI with sufficient care, and ensure that it only has goals that are compatible with human flourishing, and that it only pursues those goals in a non-existentially threatening way? Call this the “careful programming objection”. Since I am building a diagram that maps out this argument, let’s give this objection a number and a more canonical definition (numbering continues from the previous post):

(9) Careful Programming Objection: Through careful programming, we can ensure that a superintelligent AI will (a) only have final goals that are compatible with human flourishing; and (b) will only pursue those goals in ways that pose no existential threat to human beings.

As it was with the safety-test objection, this functions as a counter to the conclusion of Bostrom’s doomsday argument. The question we must now ask is whether it is any good.

Bostrom doesn’t think so. As his collaborator, Eliezer Yudkowsky, points out engineering a “friendly” advanced AI is a tricky business. Yudkowsky supports this claim by appealing to something he calls the “Fragility of Value” thesis. The idea is that if we want to programme an advanced AI to have and pursue goals that are compatible with ours, then we have to get its value-system 100% right, anything less won’t be good enough. This is because the set of possible architectures that are compatible with human interests is vastly outnumbered by the set of possible architectures that are not. Missing by even a small margin could be fatal. As Yudkowsky himself puts it:

Getting a goal system 90% right does not give you 90% of the value, any more than correctly dialing 9 out of 10 digits of my phone number will connect you to somebody who’s 90% similar to Eliezer Yudkowsky. There are multiple dimensions for which eliminating that dimension of value would eliminate almost all value from the future. For example an alien species which shared almost all of human value except that their parameter setting for “boredom” was much lower, might devote most of their computational power to replaying a single peak, optimal experience over and over again with slightly different pixel colors (or the equivalent thereof). 
(Yudkowsky, 2013)

Bostrom makes the same basic point, but appeals instead to the concept of a malignant failure mode. The idea here is that a superintelligent AI, with a decisive strategic advantage over all other intelligences, will have enough power that, if its programmers make even a minor error in specifying its goal system (e.g. if they fail to anticipate every possible implication of the system they programme), it has the capacity to fail in a “malignant” way. That’s not to say there aren’t “benign” failure modes as well — Bostrom thinks there could be lots of those — it’s just that the particular capacities of an advanced AI are such that if it fails, it could fail in a spectacularly bad way.

Bostrom identifies three potential categories of malignant failure: perverse instantiation; infrastructure profusion; and mind crime. Let’s look at each in some more detail.


2. The Problem of Perverse Instantiation
The first category of malignant failure is that of perverse instantiation. The idea here is that a superintelligence could be programmed with a seemingly benign final goal, but could implement that goal in a “perverse” manner. Perverse to whom, you ask? Perverse to us. The problem is that when a human programmer (or team of programmers) specifies a final goal, he or she may fail to anticipate all the possible ways in which that goal could be achieved. That’s because humans have many innate and learned biases and filters: they don’t consider or anticipate certain possibilities because it is so far outside what they would expect. The superintelligent AI may lack those biases and filters, so what seems odd and perverse to a human being might seem perfectly sensible and efficient to the AI.

(10) Perverse Instantiation Problem: Human programmers may fail to anticipate all the possible ways in which a goal could be achieved. This is due to their innate and learned biases and filters. A superintelligent AI may lack those biases and filters and so consequently pursue a goal in a logical, but perverse, human-unfriendly fashion.

Bostrom gives several examples of perverse instantiation in the book. I won’t go through them all here. Instead, I’ll just give you a flavour of how he thinks about the issue.

Suppose that the programmers decide that the AI should pursue the final goal of “making people smile”. To human beings, this might seem perfectly benevolent. Thanks to their natural biases and filters, they might imagine an AI telling us funny jokes or otherwise making us laugh. But there are other ways of making people smile, some of which are not-so benevolent. You could make everyone smile by paralzying their facial musculature so that it is permanently frozen in a beaming smile (Bostrom 2014, p. 120). Such a method might seem perverse to us, but not to an AI. It may decide that coming up with funny jokes was a laborious and inefficient way of making people smile. Facial paralysis is much more efficient.

But hang on a second, surely the programmers wouldn’t be that stupid? Surely, they could anticipate this possibility — after all, Bostrom just did — and stipulate that the final goal should be pursued in a manner that does not involve facial paralysis. In other words, the final goal could be something like “make us smile without directly interfering with our facial muscles” (Bostrom 2014, p. 120). That won’t prevent perverse instantiation either, according to Bostrom. This time round, the AI could simply take control of that part of our brains that controls our facial muscles and constantly stimulate it in such a way that we always smile.

Bostrom runs through a few more iterations of this. He also looks at final goals like “make us happy” and notes how it could lead the AI to implant electrodes into the pleasure centres of our brains and keep them on a permanent “bliss loop”. He also notes that the perverse instantiations he discusses are just a tiny sample. There are many others, including ones that human beings may be unable to think of at the present time.

So you get the basic idea. The concern that Bostrom raises has been called the “literalness problem” by other AI risk researchers (specifically Muehlhauser and Helm, whose work I discuss here LINK). It arises because we have a particular conception of the meaning of a goal (like “making us happy”), but the AI does not share that conception because that conception is not explicitly programmed into the AI. Instead, that conception is implied by the shared understandings of human beings. Even if the AI realised that we had a particular conception of what “make us happy” meant, the AI’s final goal would not stipulate that it should follow that conception. It would only stipulate that it should make us happy. The AI could pursue that goal in any logically compatible manner.

Now, I know that others have critiqued this view of the “literalness problem”, arguing that it assumes a certain style of AI system and development that need not be followed (Richard Loosemore has recently made this critique). But Bostrom thinks the problem is exceptionally difficult to overcome. Even if the AI seems to follow human conceptions of what it means to achieve a goal, there is always the problem of the treacherous turn:

The AI may indeed understand that this is not what we meant. However, its final goal is to make us happy, not to do what the programmers meant when they wrote the code that represents this goal. Therefore, the AI will care about what we meant only instrumentally. For instance, the AI might place an instrumental value on finding out what the programmers meant so that it can pretend — until it gets a decisive strategic advantage — that it cares about what the programmers meant rather than about its actual final goal. This will help the AI realize its final goal by making it less likely that the programmers will shut down or change its goal before it is strong enough to thwart any such interference. 
(Bostrom 2014, p. 121)

As I mentioned in my previous post, the assumptions and possibilities that Bostrom is relying on when making claims about the treacherous turn come with significant epistemic costs.


3. The Problem of Infrastructure Profusion
The second form of malignant failure is what Bostrom calls infrastructure profusion. This is essentially just a specific form of perverse instantiation which arises whenever an AI builds a disproportionately large infrastructure for fulfilling what seems to be a pretty benign or simple goal. Imagine, for example, an AI with the following final goal:


  • Final Goal: Maximise the time-discounted integral of your future reward signal


This type of goal — unlike the examples given above — is something that could easily be programmed into an AI. One way in which the AI could perversely instantiate it is by “wireheading”, i.e. seizing control of its own reward circuit and “clamp[ing] the reward signal to its maximal strength” (Bostrom 2014, p. 121). The problem then is that the AI becomes like a junkie. As you know, junkies often dedicate a great deal of time, effort and ingenuity to getting their “fix”. The superintelligent AI could do the same. The only thing it would care about would be maximising its reward signal, and it would take control of all available resources in the attempt to do just that. Bostrom gives other examples of this involving AIs designed to maximise the number of paperclips or evaluate the Riemann hypothesis (in the latter case he imagines the AI turning the solar system into a “computronium”, an arrangement of matter that is optimised for computation).

(11) Infrastructure Profusion Problem: An intelligent agent, with a seemingly innocuous or innocent goal, could engage in infrastructure profusion, i.e. it could transform large parts of the reachable universe into an infrastructure that services its own goals, and is existentially risky for human beings.

This is the problem of resource acquisition, once again. An obvious rebuttal to it would be to argue that the problem stems from final goals that involve the “maximisation” of some output. Why programme an AI to maximise? Why not simply programme it to satisfice, i.e. be happy once it crosses some minimum threshold? There are a couple of ways we could do this. Either by specifying an output-goal with a minimum threshold or range (e.g. make at least 800,000 to 1.5 million paperclips); and/or by specifying some permissible probability threshold for the attainment of the goal.

As regards the first option, Bostrom argues that this won’t prevent the problem of infrastructure profusion. As he puts it:

[I]f the AI is a sensible Bayesian agent, it would never assign exactly zero probability to the hypothesis that it has not yet achieved its goal—this, after all, being an empirical hypothesis against which the AI can have only uncertain perceptual evidence. The AI should therefore continue to make paperclips in order to reduce the (perhaps astronomically small) probability that it has somehow still failed to make a million of them, all appearances notwithstanding. 
(Bostrom 2014, 123-4)

He goes on to imagine the AI building a huge computer in order to clarify its thinking and make sure that there isn’t some obscure way in which it may have failed to achieve its goal. Now, you might think the solution to this is to just adopt the second method of satisficing, i.e. specify some probability threshold for goal attainment. That way, the AI could be happy once it is, say, 95% probable that it has achieved its goal. It doesn’t have to build elaborate computers to test out astronomically improbable possibilities. But Bostrom argues that not even that would work. For there is no guarantee that the AI would pick some humanly intuitive way of ensuring 95% probability of success (nor, I suppose, that it would estimate probabilities in the same kind of way).

I don’t know what to make of all this. There are so many possibilities being entertained by Bostrom in his response to criticisms. He seems to think the risks remain significant no matter how far-fetched these possibilities seem. The thing is, he may be right in thinking this. As I have said before, the modal standards one should employ when it comes to dealing with arguments about what an advanced AI might do are difficult to pin down. Maybe the seemingly outlandish possibilities become probable when you have an advanced AI; then again, maybe not. Either way, I hope you are beginning to see how difficult it is to unseat the conviction that superintelligent AI could pose an existential risk.



4. Mind Crimes and Conclusions
The third malignant failure mode is not as important as the other two. Bostrom refers to it as “mind crime”. In the case of perverse instantiation and infrastructure profusion, the AI produces effects in the real-world that are deleterious to the interests of human beings. In the case of mind crimes, the AI does things within its own computational architecture that could be deleterious to the interests of virtual beings. Bostrom imagines an advanced AI running a complex simulation which includes simulated beings that are capable consciousness (or, which may be different have some kind of moral status that should make us care about what happens to them). What if the AI tortures those beings? Or deletes them? That could be just as bad as a moral catastrophe in the real world. That would be another malignant failure.

This is, no doubt, a fascinating possibility and once again it stresses the point that AIs could do a variety of malignant things. This is the supposed lesson from this section of Bostrom’s book, and it is intended to shore up the existential risk argument. I won’t offer any overall evaluation of the argument at this stage, however, because, over the next few posts, we will be dealing with many more suggestions for addressing risks from superintelligent AIs. The fate of these suggestions will affect the fate of the existential risk argument.

Tuesday, July 29, 2014

Bostrom on Superintelligence (3): Doom and the Treacherous Turn



(Series Index)

This is the third part of my series on Nick Bostrom’s recent book Superintelligence: Paths, Dangers, Strategies. In the first two entries, I looked at some of Bostrom’s conceptual claims about the nature of agency, and the possibility of superintelligent agents pursuing goals that may be inimical to human interests. I now move on to see how these conceptual claims feed into Bostrom’s case for an AI doomsday scenario.

Bostrom sets out this case in Chapter 8 of his book, which is entitled “Is the default outcome doom?”. To me, this is the most important chapter in the book. In setting out the case for doom, Bostrom engages in some very, how shall I put it, “interesting” (?) forms of reasoning. Critics will no doubt latch onto them as weak points in his argument, but if Bostrom is right then there is something truly disturbing about the creation of superintelligent AIs.

Anyway, I’ll be discussing Chapter 8 over the next two posts. In the remainder of this one, I’ll do two things. First, I’ll look at Bostrom’s three-pronged argument for doom. This constitutes his basic case for the doomsday scenario. Then, I’ll look at something Bostrom calls the “Treacherous Turn”. This is intended to shore up the basic case for doom by responding to an obvious criticism. In the course of articulating this treacherous turn, I hope to highlight some of the profound epistemic costs of Bostrom’s view. Those costs may have to be borne — i.e. Bostrom may be right — but we should be aware of them nonetheless.


1. The Three-Pronged Argument for Doom
Bostrom is famous for coming up with the concept of an “existential risk”. He defines this as a risk “that threatens to cause the extinction of Earth-originating intelligent life or to otherwise permanently and drastically destroy its potential for future desirable development” (Bostrom 2014, p. 115). One of the goals of the institute he runs — the Future of Humanity Institute — is to identify, investigate and propose possible solutions to such existential risks. One of the main reasons for his interest in superintelligence is the possibility that such intelligence could pose an existential risk. So when he asks the question “Is the default outcome doom?”, what he is really asking is “Is the creation of a superintelligent AI likely to create an existential risk?”

Bostrom introduces an argument for thinking that it might. The argument is based on three theses, all of which he articulates and defends in the book — two of which we already looked at, and one of which was discussed in an earlier chapter, not included in the scope of this series of posts. The three theses are (in abbreviated form):


(1) The first mover thesis: The first superintelligence, by virtue of being first, could obtain a decisive strategic advantage over all other intelligences. It could form a “singleton” and be in a position to shape the future of all Earth-originating intelligent life.
(2) The orthogonality thesis: Pretty much any level of intelligence is consistent with pretty much any final goal. Thus, we cannot assume that a superintelligent artificial agent will have any of the benevolent values or goals that we tend to associate with wise and intelligent human beings (shorter version: great intelligence is consistent with goals that pose a grave existential risk).
(3) The instrumental convergence thesis: A superintelligent AI is likely to converge on certain instrumentally useful sub-goals, that is: sub-goals that make it more likely to achieve a wide range of final goals across a wide-range of environments. These convergent sub-goals include the goal of open-ended resource acquisition (i.e. the acquisition of resources that help it to pursue and secure its final goals).


Bostrom doesn’t set out his argument for existential risk formally, but gives us enough clues to see how the argument might fit together. The first step is to argue that the conjunction of these three theses allows us to reach the following, interim, conclusion:


(4) Therefore, “the first superintelligence may [have the power] to shape the future of Earth-originating life, could easily have non-anthropomorphic final goals, and would likely have instrumental reasons to pursue open-ended resource acquisition” (Bostrom 2014, p. 116)


If we then combine that interim conclusion with the following premise:


(5) Human beings “consist of useful resources (such as conveniently located atoms)” and “we depend for our survival and flourishing on many more local resources” (Bostrom 2014, p. 116).


We can reach the conclusion that:


(6) Therefore, the first superintelligence could have the power and reason to do things that lead to human extinction (by appropriating resources we rely on, or by using us as resources).


And that is, essentially, the same thing as saying that the first superintelligence could pose a significant existential risk. I have mapped out this pattern of reasoning below.



Now, clearly, this doomsday argument is highly speculative. There are a number of pretty wild assumptions that go into it, and critics will no doubt be apt to question them. Bostrom acknowledges this, saying that it would indeed be “incredible” to imagine a project that would build and release such a potentially catastrophic AI into the world. There are two reasons for this incredulity. The first is that, surely, in the process of creating a superintelligent AI, we would have an array of safety measures and test protocols in place to ensure that it didn’t pose an existential threat before releasing it into the world. The second is that, surely, AI programmers and creators would programme the AI to have benevolent final goals, and so would not pursue open-ended resource acquisition.

These reasons are intuitively attractive. They provide us with some optimism about the creation of artificial general intelligence. But Bostrom isn’t quite so optimistic (though, to be fair, he actually is pretty sober throughout the book: he doesn’t come across as a wild-eyed doom-mongerer, or as a polyanna-ish optimist, he lays out his analysis in a “matter of fact” manner). He argues that when we think about the nature of a superintelligent AI more clearly, we see that neither of these reasons for optimism is persuasive. I’ll look at his response to the first reason for optimism in the remainder of this post.


2. The Problem of the Treacherous Turn
Critics of AI doomsayers sometimes chastise those doomsayers for their empirically detached understanding of AI. The doomsayers don’t pay enough attention to how AIs are actually created and designed in the real world, they engage in too much speculation and too much armchair theorising. In the real world, AI projects are guided by human programmers and designers. These programmers and designers create AIs with specific goals in mind — though some are also interested in created general intelligences — and they typically test their designs in limited “safe” environments before releasing them to the general public. An example might be the AI that goes into self-driving cars: these AI are designed with a specific final goal in mind (the ability to safely navigate a car to a given destination), and they are rigorously tested for their ability to do this safely, and without posing a significant risk (“existential” or otherwise) to human beings. The point the critics then make is why couldn’t this approach to AI development be followed in all instances. Why couldn’t careful advance testing protect us from existential risk? Let’s call this the “safety test” objection to the doomsday argument:


(7) Safety test objection: An AI could be empirically tested in a constrained environment before being released into the wild. Provided this testing is done in a rigorous manner, it should ensure that the AI is “friendly” to us, i.e. poses no existential risk.


The safety test objection doesn’t function as a rebuttal to any of the premises of Bostrom’s original argument. In other words, it accepts the bare possibility of what Bostrom has to say. It simply argues that there is a simple way to avoid the negative outcomes. Consequently, I view it as a reason to reject the conclusion of Bostrom’s argument.

Is the safety test objection plausible? Bostrom says “no”. To see why, we need to understand the nature of strategic thinking. If I have certain goals I wish to achieve, but I need your cooperation to help me achieve them, and you are unwilling to provide that cooperation because you don’t like my goals, it may be in my interest to convince you that I don’t have those goals. Or to put it more succinctly: if I have some wicked or malevolent intent, it may nevertheless be in my interests to “play nice” so that you can help to put me in a position to implement that malevolent intent. Actually, the point is more general than that. Even if my intentions are entirely benevolent, there may nevertheless be contexts in which it pays to deceive you as to their true nature. Furthermore, the point doesn’t just apply to intentions, it also applies to abilities and skills. I may be an ace pool player, for example, but if I want to win a lucrative bet with you, it might pay me to pretend that I am incompetent for a couple of games. This will lull you into a false sense of security, encourage you to put a big bet on one game, at which point I can reveal my true skill and win the money. Humans play these strategic games of deception and concealment with each other all the time.

Bostrom’s response to the safety test objection makes the point that superintelligent AIs could play the same sort of games. They could “play nice” while being tested, concealing their true intentions and abilities from us, so as to facilitate their being put in position to exercise their true abilities and realise their true intentions. As he himself puts it:


The flaw in this idea [the safety test objection] is that behaving nicely while in the box is a convergent instrumental goal for friendly and unfriendly AIs alike. An unfriendly AI of sufficient intelligence realizes that its unfriendly final goals will be best realised if it behaves in a friendly manner initially, so that it will be let out of the box. It will only start behaving in a way that reveals its unfriendly nature when it no longer matters whether we find out; that is, when the AI is strong enough that human opposition is ineffectual.
(Bostrom 2014, p. 117


Or to put it another way: no matter how much testing we do, it is always possible that the AI will take a “treacherous turn”:


(8) The Treacherous Turn Problem: An AI can appear to pose no threat to human beings through its initial development and testing, but once in a sufficiently strong position it can take a treacherous turn, i.e. start to optimise the world in ways that pose an existential threat to human beings.


(Note: this definition diverges somewhat from the definition given by Bostrom in the text. I don’t think the alterations I make do great violence to the concept, but I want the reader to be aware of that they are there.)




Bostrom is keen to emphasise how far-reaching this problem is. In the book, he presents an elaborate story about the design and creation of a superintelligent AI, based on initial work done on self-driving cars. The story is supposed to shows that all the caution and advance testing in the world cannot rule out the possibility of an AI taking a treacherous turn. He also notes that an advanced AI may even encourage its own destruction, if it is convinced that doing so will lead to the creation of a new AI that will be able to achieve the same goals. Finally, he highlights how an AI could take a treacherous turn by just suddenly happening upon a treacherous way of achieving its final goals.

This is all superficially plausible. It is indeed conceivable that an intelligent system — capable of strategic planning — could take such treacherous turns. And a sufficiently time-indifferent AI could play a “long game” with us, i.e. it could conceal its true intentions and abilities for a very long time. Nevertheless, accepting this has some pretty profound epistemic costs. It seems to suggest that no amount of empirical evidence could ever rule out the possibility of a future AI taking a treacherous turn. In fact, its even worse than that. If we take it seriously, then it is possible that we have already created an existentially threatening AI. It’s just that it is concealing its true intentions and powers from us for the time being.

I don’t quite know what to make of this. Bostrom is a pretty rational, bayesian guy. I tend to think he would say that if all the evidence suggests that our AI is non-threatening (and if there is a lot of that evidence), then we should heavily discount the probability of a treacherous turn. But he doesn’t seem to add that qualification in the chapter. He seems to think the threat of an existential catastrophe from a superintelligent AI is pretty serious. So I’m not sure whether he embraces the epistemic costs I just mentioned or not.

Anyway, that brings us to the end of this post. To briefly recap, Bostrom’s doomsday argument is based on the combination of three theses: (i) the first mover thesis; (ii) the orthogonality thesis; and (iii) the instrumental convergence thesis. Collectively, these theses suggest that the first superintelligent AI could have non-anthropomorphic final goals and could pursue them in ways that are inimical to human interests. There are two obvious ripostes to this argument. We’ve just looked at one of them — the safety test objection — and seen how Bostrom’s reply seems to impose significant epistemic costs on the doomsayer. In the next post, we’ll look at the second riposte and what Bostrom has to say about it. It may be that his reason for taking the threat seriously stem more from that riposte.

Monday, July 28, 2014

Bostrom on Superintelligence (2): The Instrumental Convergence Thesis



(Series Index)

This is the second post in my series on Nick Bostrom’s recent book Superintelligence: Paths, Dangers, Strategies. In the previous post, I looked at Bostrom’s defence of the orthogonality thesis. This thesis claimed that pretty much any level of intelligence — when “intelligence” is understood as skill at means-end reasoning — is compatible with pretty much any (final) goal. Thus, an artificial agent could have a very high level of intelligence, and nevertheless use that intelligence to pursue very odd final goals, including goals that are inimical to the survival of human beings. In other words, there is no guarantee that high levels of intelligence among AIs will lead to a better world for us.

The orthogonality thesis has to do with final goals. Today we are going to look at a related thesis: the instrumental convergence thesis. This thesis has to do with sub-goals. The thesis claims that although a superintelligent AI could, in theory, pursue pretty much any final goal, there are, nevertheless, certain sub-goals that it is likely to pursue. This is for the simple reason that certain sub-goals will enable it to achieve its final goals. Different agents are, consequently, likely to “converge” upon those sub-goals. This makes the future behaviour of superintelligent AIs slightly more predictable from a human standpoint.

In the remainder of this post, I’ll offer a more detailed characterisation of the instrumental convergence thesis, and look at some examples of convergent sub-goals.


1. What is the Instrumental Convergence Thesis?
Bostrom characterises the instrumental convergence thesis in the following manner:

Instrumental Convergence Thesis: Several instrumental values [or goals] can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realised for a wide range of final goals and a wide range of situations, implying that these instrumental values [or goals] are likely to be pursued by a broad spectrum of situated intelligent agents.

An analogy with evolutionary theory might help us to understand this idea. (I know, I’ve now used two analogies with evolution in the first two posts of this series. I promise it won’t be trend.) In his work on evolution, the philosopher Daniel Dennett employs the concept of a “good trick”. Evolution by natural selection is a goal directed process. The goal is to ensure the survival of different genotypes. Organisms (or more specifically the genotypes they carry) adapt to the environments in which they live in order to achieve that goal. The thing is, there is huge variation in those environments: what is adaptive in one may not be adaptive in another. Nevertheless, there are certain “good tricks” that will enable organisms to survive across a wide range of environments. For example, eyesight is useful in nearly all environments. Because they are so useful, different groups of organisms — often with very divergent evolutionary histories — tend to hit on these “good tricks” over and over again, across evolutionary time. This phenomenon is actually known as convergent evolution, though I am fond of Dennett’s label. (Dennett also uses the related concept of a “Forced Move”).

I think Bostrom’s concept of instrumental convergence is very much like Dennett’s concept of a “good trick”, except that Bostrom’s concept is even broader. Dennett is dealing with evolution by natural selection, which involves one overarching final goal, being pursued in a variety of environments. Bostrom is concerned with agents who could have many possible final goals and who could be operating in many possible environments. Nevertheless, despite this added complexity, Bostrom is convinced that are certain (very general) sub-goals that are useful to agents across a wide range of possible final goals and a wide range of possible environments. Consequently, we are likely to see even superintelligent agents hitting upon these “good tricks”.

So what might these “good tricks” be? The basic rule is:

If X is likely to increase an agent’s chances of achieving its final goals (no matter what those final goals might be) across a wide range of environments, then X is likely to be a (convergent) sub-goal of all agents.

Let’s look at some possible examples. Each of these is discussed by Bostrom in his book.


2. Self-Preservation and Goal-Content Integrity
The first two mentioned by Bostrom are self-preservation and goal-content integrity. They are closely related, though the latter is more important when it comes to understanding superintelligent AIs.

The sub-goal of self-preservation is familiar to humans. Indeed, as Bostrom notes, humans tend to pursue this as a final goal: with certain exceptions, there is almost nothing more valuable to a human being than its own survival. The situation is slightly different for an AI. Unless it is deliberately created in such a way that it has no intrinsic final goals — i.e. it learns to acquire goals over time — or unless it is explicitly programmed with the final goal of self-preservation, the AI’s interest in its own survival will always play second fiddle to its interest in achieving its final goal. Nevertheless, with the exception of the goal of immediate self-destruction, most of those goals will take time to achieve. Consequently, it will be instrumentally beneficial for the AI to preserve its own existence until the goal is achieved (or until it is certain that its own destruction is necessary for achieving the goal).

Embedded in this is the more important convergent sub-goal of goal-content integrity. In essence, this is the idea that an agent needs to retain its present goals into the future, in order to ensure that its future self will pursue and attain those goals. Humans actually use a variety of tricks to ensure that they maintain their present goals. Smokers who really want to quit, for example, will adopt a range of incentives and constraints in order to ensure that their future selves will stick with the goal of quitting. We can imagine artificial agents needing to do the same sort of thing. Though when we imagine this we have to remember that artificial agents are unlikely to suffer from weakness of the will in the same way as human agents: just preserving the goal over time will be enough for them. Bostrom argues that goal-content integrity is more important than self-preservation for AIs. This is because, as noted, the need for self-preservation is highly contingent upon the nature of the final goal; whereas the integrity of the final goal itself is not.

That said, Bostrom does think there are scenarios in which an agent may change its final goals. He gives a few examples in the text. One is that it might change them in order to secure trusting partners to cooperative exchanges. The idea is that in order to pursue its goals, an agent may need to cooperate with other agents. But those other agents may not trust the agent unless it alters its goals. This may give the agent an incentive to change its final goals. It could also be the case that the agent’s final goal includes preferences about the content of its final goals. In other words, it may be programmed to ensure that it is motivated by certain values, rather than that it pursue a particular outcome. This could entail alteration of goals over time. Finally, the cost of maintaining a certain final goal, relative to the likelihood of achieving that goal, might be so large that the agent is incentivised to “delete” or “remove” that final goal.

I think the idea of an agent altering its final goals is a coherent one. Humans do it all the time. But I have some worries about these examples. For one thing, I am not sure they are internally coherent. The notion of an agent changing its final goals in order to secure cooperative partners, seems pretty odd to me. It seems like its final goals would, in that case, simply be kept “in reserve” and a superficial mask of alteration put in place to appease the cooperative partners. Furthermore, in his defence of the orthogonality thesis, and later in his defence of the AI doomsday scenario (which we’ll look in the next post), Bostrom seemed to assume that final goals would be stable and overwhelming. If they could be as easily altered as these examples seem to suggest, then the impact of those defences might be lessened.


3. Cognitive Enhancement and Technological Perfection
Another plausible convergent sub-goal for an intelligent agent would be the pursuit of its own cognitive enhancement. The argument is simple. An agent must have the ability to think and reason accurately about the world in order to pursue its goals. Surely, it can do this better if it enhances its own cognitive abilities? Enhancement technologies are an obvious way of doing this. Furthermore, the first AI that is in a position to become a superintelligence might place a very high instrumental value on its own cognitive enhancement. Why? Because doing so will enable it to obtain a decisive strategic advantage over all other agents, which will place it in a much better position to achieve its goals.

There are some exceptions to this. As noted in the discussion of the orthogonality thesis, it is possible that certain types of cognitive skill are unnecessary when it comes to the attainment of certain types of goal. Bostrom uses the example of “Dutch book arguments” to suggest that proficiency in probability theory is a valuable cognitive skill, but also notes that if the agent does not expect to encounter “Dutch book”-type scenarios, it may not be necessary to acquire all that proficiency. Similarly, an agent might be able outsource some of its cognitive capacities to other agents. In fact, humans do this all the time: it’s one of the reasons we are creating AIs.

Another plausible convergent sub-goal is technological perfection. This would be the pursuit of advanced (“perfected”) forms of technology. We use technology to make things easier for ourselves all the time. Building and construction technologies, for example, enable architects and engineers to better realise their goals; medical technologies help us all to prevent and cure illnesses; computing software makes it easier for me to write articles and blog posts (in fact, the latter wouldn’t even be possible without technology). An AI is likely to view technology in the same way, constantly seeking to improve it and, since an AI is itself technological, trying to integrate the new forms of technology with itself. Again, this would seem to be particularly true in the case of a “singleton” (an AI with no other rivals or opposition). It is likely to use technology to obtain complete mastery over its environment. Bostrom suggests that this will encompass the development of space colonisation technologies (such as Von Neumann probes) and molecular/nano technologies.

Again, there will be exceptions to all this. The value of technological perfection will be contingent upon the agent’s final goals. The development of advanced technologies will be costly. The agent will need to be convinced that those costs are worth it. If it can pursue its goals in some technologically less efficient manner, with significant cost savings, it may not be inclined toward technological perfection.


4. Resource Acquisition
The final sub-goal discussed by Bostrom is resource acquisition. This too is an obvious one. Typically, agents need resources in order to achieve their goals. If I want to build a house, I need to acquire certain resources (physical capital, financial capital, human labour etc.). Similarly, if a superintelligent AI has the goal of, say, maximising the number of paperclips in the universe, it will need some plastic or metal that it can fashion into paperclips. AIs with different goals would try to acquire other kinds of resources. The possibilities are pretty endless.

There is perhaps one important difference between humans and AIs when it comes to resource acquisition. Humans often accumulate resources for reasons of social status. The bigger house, the bigger car, the bigger pile of money — these are all things that help to elevate the status of one human being over another. This can be useful to humans for a variety of reasons. Maybe they intrinsically enjoy the elevated status, or maybe the elevated status gets them other things. Given that an AI need not be subject to the same social pressures and psychological quirks, we might be inclined to think that they will be less avaricious in their acquisition of resources. We might be inclined to think that they will only accumulate a modest set of resources: whatever they need to achieve their final goals.

We would be wrong to think this. Or so, at least, Bostrom argues. Advances in technology could make it the case that virtually anything could be disassembled and reassembled (at an atomic or even sub-atomic level) into a valuable resource. Consequently, virtually everything in the universe could become a valuable resource to a sufficiently advanced AI. This could have pretty far-reaching implications. If the AI’s goal is to maximise some particular quantity or outcome, then it would surely try to acquire all the resources in the universe and put them to use in pursuing that goal. Furthermore, even if the AI’s goal is ostensibly more modest (i.e. doesn’t involve “maximisation”), the AI may still want to create backups and security barriers to ensure that goal attainment is preserved. This too could consume huge quantities of resources. Again, Bostrom points to the likelihood of the AI using Von Neumann probes to assist in this. With such probes they could colonise the universe and harvest its resources.

As you no doubt begin to see, the likelihood of convergence upon this final sub-goal is particularly important when it comes to AI doomsday arguments. If it is true that a superintelligent AI will try to colonise the universe and harvest all its resources, we could easily find ourselves among those “resources”.


5. Conclusion
So that’s a brief overview of the instrumental convergence thesis, along with some examples of instrumentally convergent sub-goals. I haven’t offered much in the way of critical commentary in this post. That’s partly because Bostrom qualifies his own examples quite a bit anyway, and also partly because this post is laying the groundwork for the next one. That post will deal with Bostrom’s initial defence of the claim that a superintelligence explosion could spell doom for humans. I’ll have some more critical comments when we look at that.

Sunday, July 27, 2014

Bostrom on Superintelligence (1): The Orthogonality Thesis



(Series Index)

This is the first post in my series on Nick Bostrom’s recent book, Superintelligence: Paths, Dangers, Strategies. In this entry, I take a look at Bostrom’s orthogonality thesis. As we shall see, this thesis is central to his claim that superintelligent AIs could pose profound existential risks to human beings. But what does the thesis mean and how plausible is it?

I actually looked at Bostrom’s defence of the orthogonality thesis before. I based that earlier discussion on an article he wrote a couple of years back. From what I can tell, there is little difference between the arguments presented in the book and the arguments presented in that article. Nevertheless, it will be useful for me to revisit those arguments at the outset of this series. This is, in part, to refresh my own memory and also, in part, to ease myself back into the intellectual debate about superintelligent AIs after having ignored it for some time. Who knows? I may even have something new to say.

I should add that, since the publication of Bostrom’s original defence of the orthogonality thesis, his colleague Stuart Armstrong has produced a longer and more technical defence of it in the journal Analysis and Metaphysics. Unfortunately, I have not read that defence. Thus, I am conscious of the fact that what I deal with below may be the “second-best” defence of the orthogonality thesis. This is something readers should keep in mind.


1. What is the orthogonality thesis and why does it matter?
One thing that proponents of AI risk often warn us against is our tendency to anthropomorphise intelligent machines. Just because we humans think in a particular way, and have certain beliefs and desires, does not mean that an intelligent machine, particularly a superintelligent machine, will do the same. (Except for the fact we will be the ones programming the decision-making routines and motivations of the machine…more on this, and whether it can help to address the problem of AI risk, in future entries). We need to realise that the space of possible minds is vast, and that the minds of every human being that ever lived only occupy a small portion of that space. Superintelligences could take up residence in far more alien, and far more disturbing, regions.

The orthogonality thesis is a stark reminder of this point. We like to think that “intelligent” agents will tend to share a certain set of beliefs and motivations, and that with that intelligence will come wisdom and benevolence. This, after all, is our view of “intelligent” humans. But if we understand intelligence as the ability to engage in sophisticated means-end reasoning, then really there is no guarantee of this. Almost any degree of intelligence, so understood, is compatible with almost any set of goals or motivations. This is the orthogonality thesis. As Bostrom puts it:

Orthogonality Thesis: Intelligence and final goals are orthogonal: more or less any level of intelligence could in principle be combined with more or less any final goal.

We need to unpack this definition in a little more detail.

We’ll start with the concept of “intelligence”. As noted, Bostrom does not mean to invoke any normatively thick or value-laden form of rationality; he simply means to invoke efficiency and skill at means-end reasoning. Philosophers and economists have long debated these definitional issues. Philosophers sometimes think that judgments of intelligence or rationality encompass the assessment of motivations. Thus, for a philosopher a person who greatly desires to count all the blades of grass in the world would be “irrational” or “mentally deficient” in some important respect. Economists generally have a thinner sense of what intelligence or rationality requires. They do not assess motivations. The blade-of-grass-counter is just as rational as anyone else. All that matters is whether they maintain logical hierarchies of motivations and act in accordance with those hierarchies. Bostrom’s view of intelligence is closer to the economists' sense of rationality, except that it also encompasses great skill in getting what you want. This skill must, one presumes, include an ability to acquire true (or reasonably true) beliefs about the structure of the world around you. This is so that you can manipulate that world to reliably get what you want.

The second concept we need to unpack is that of a “final goal”. As far as I can tell, Bostrom never defines this in his book, but the idea is relatively straightforward. It is that an agent can have certain goals which are their raison d’etre, that they fundamentally and necessarily aim at achieving, and others that are merely instrumental to the pursuit of those final goals. In other words, there are certain goals that are such that everything else the agent does is tailored toward the achievement of that goal. For Bostrom, the supposition seems to be that a superintelligent AI could be programmed so that it has a set of final goals are dynamically stable and overwhelming (note: the use of “could be” is significant). This is important because Bostrom appeals to the possibility of overwhelming and dynamically stable final goals when responding to possible criticisms of the orthogonality thesis.

The third thing we need to unpack is the “more or less” qualifier. Bostrom acknowledges that certain goals may not be consistent with certain levels of intelligence. For example, complex goals might require a reasonably complex cognitive architecture. Similarly, there may be dynamical constraints on the kinds of motivations that a highly intelligent system could have. Perhaps the system is programmed with the final goal of making itself stupider. In that case, its final goal is not consistent with a high level of intelligence. These qualifications should not, however, detract from the larger point: that pretty much any level of intelligence is consistent with pretty much any final goal.

So that’s the orthogonality thesis in a nutshell. The thesis is important for the likes of Bostrom because, when understood properly, it heightens our appreciation of AI risk. If a superintelligent machine could have pretty much any final goal, then it could do things that are deeply antithetical to our own interests. That could lead to existential catastrophe. (We'll discuss the argument for this conclusion in a later entry)


2. Is the Orthogonality Thesis Plausible?
At first glance, the orthogonality thesis seems pretty plausible. For example, the idea of a superintelligent machine whose final goal is to maximise the number of paperclips in the world (the so-called paperclip maximiser) seems to be logically consistent. We can imagine — can’t we? — a machine with that goal and with an exceptional ability to utilise the world’s resources in pursuit of that goal. Nevertheless, there is at least one major philosophical objection to it.

We can call it the motivating belief objection. It works something like this:

Motivating Belief Objection: There are certain kinds of true belief about the world that are necessarily motivating, i.e. as soon as an agent believes a particular fact about the world they will be motivated to act in a certain way (and not motivated to act in other ways). If we assume that the number of true beliefs goes up with intelligence, it would then follow that there are certain goals that a superintelligent being must have and certain others that it cannot have.

A particularly powerful version of the motivating belief objection would combine it with a form of moral realism. Moral realism is the view that there are moral facts “out there” in the world waiting to be discovered. A sufficiently intelligent being would presumably acquire more true beliefs about those moral facts. If those facts are among the kind that are motivationally salient — as several moral theorists are inclined to believe — then it would follow that a sufficiently intelligent being would act in a moral way. This could, in turn, undercut claims about a superintelligence posing an existential threat to human beings (though that depends, of course, on what the moral truth really is).

The motivating belief objection is itself vulnerable to many objections. For one thing, it goes against a classic philosophical theory of human motivation: the Humean theory. This comes from the philosopher David Hume, who argued that beliefs are motivationally inert. If the Humean theory is true, the motivating belief objection fails. Of course, the Humean theory may be false and so Bostrom wisely avoids it in his defence of the orthogonality thesis. Instead, he makes three points. First, he claims that orthogonality would still hold if final goals are overwhelming, i.e. if they trump the motivational effect of motivating beliefs. Second, he argues that intelligence (as he defines it) may not entail the acquisition of such motivational beliefs. This is an interesting point. Earlier, I assumed that the better an agent is at means-end reasoning, the more likely it is that its beliefs are going to be true. But maybe this isn’t necessarily the case. After all, what matters for Bostrom’s definition of intelligence is whether the agent is getting what it wants, and it’s possible that an agent doesn’t need true beliefs about the world in order to get what it wants. A useful analogy here might be with Plantinga’s evolutionary argument against naturalism. Evolution by natural selection is a means-end process par excellence: the “end” is survival of the genes, anything that facilitates this is the “means”. Plantinga argues that there is nothing about this process that entails the evolution of cognitive mechanisms that track true beliefs about the world. It could be that certain false beliefs increase the probability of survival. Something similar could be true in the case of a superintelligent machine. The third point Bostrom makes is that a superintelligent machine could be created with no functional analogues of what we call “beliefs” and “desires”. This would also undercut the motivating belief objection.

What do we make of these three responses? They are certainly intriguing. My feeling is that the staunch moral realist will reject the first one. He or she will argue that moral beliefs are most likely to be motivationally overwhelming, so any agent that acquired true moral beliefs would be motivated to act in accordance with them (regardless of their alleged “final goals”). The second response is more interesting. Plantinga’s evolutionary objection to naturalism is, of course, hotly contested. Many argue that there are good reasons to think that evolution would create truth-tracking cognitive architectures. Could something similar be argued in the case of superintelligent AIs? Perhaps. The case seems particularly strong given that humans would be guiding the initial development of AIs and would, presumably, ensure that they were inclined to acquire true beliefs about the world. But remember Bostrom’s point isn’t that superintelligent AIs would never acquire true beliefs. His point is merely that high levels of intelligence may not entail the acquisition of true beliefs in the domains we might like. This is a harder claim to defeat. As for the third response, I have nothing to say. I have a hard time imagining an AI with no functional analogues of a belief or desire (especially since what counts as a functional analogue of those things is pretty fuzzy), but I guess it is possible.

One other point I would make is that — although I may be inclined to believe a certain version of the moral motivating belief objection — I am also perfectly willing to accept that the truth value of that objection is uncertain. There are many decent philosophical objections to motivational internalism and moral realism. Given this uncertainty, and given the potential risks involved with the creation of superintelligent AIs, we should probably proceed for the time being “as if” the orthogonality thesis is true.


3. Conclusion
That brings us to the end of the discussion of the orthogonality thesis. To recap, the thesis holds that intelligence and final goals are orthogonal to one another: pretty much any level of intelligence is consistent with pretty much any final goal. This gives rise to the possibility of superintelligent machines with final goals that are deeply antithetical to our own. There are some philosophical objections to this thesis, but even if they are true, their truth values are sufficiently uncertain that we should not discount the orthogonality thesis completely. Indeed, given the potential risks at stake, we should probably proceed “as if” it is true.

In the next post, we will look at the instrumental convergence thesis. This follows on from the orthogonality thesis by arguing that even if a superintelligence could have pretty much any final goal, it is still likely to converge on certain instrumentally useful sub-goals. These sub-goals could, in turn, be particularly threatening to human beings.

Bostrom on Superintelligence (0): Series Index



Nick Bostrom’s magnum opus on the topic of AI risk — Superintelligence: Paths, Dangers and Strategies — was recently published by Oxford University Press. The book is a comprehensive overview and analysis of the risks arising from an intelligence explosion. As you may know, some people are concerned that the creation of superintelligent machines will precipitate an existential catastrophe for the human race. For better or worse, the debate about this issue has largely taken place online, via various internet fora. Now, while I’m certainly not one to disparage such fora — this blog, after all, would count as one — I have to admit that Bostrom’s book is something of a relief. At last, we have a detailed, reasonably sober, academic analysis of the issue, one that is clearly the product of many years of research, reflection and discussion.

Having now read through significant portions of the book (not all of it), I can certainly recommend it to those who are interested in the topic. It’s very readable. Anyone with a passing familiarity with artificial intelligence, probability theory and philosophical analysis will be able to get through it. And most of the more technical portions of the analysis are helpfully separated out from the main text in a series of “boxes”.

I'm finding the book sufficiently useful that I am going to try to blog my way through a portion of it. I won’t have the time or energy to do all of it, unfortunately. So instead I’m going to focus on the bits that I find most interesting. These are the bits dealing with the claim that the creation of a superintelligent AI could spell doom for human beings, and with some of the alleged strategies for containing that risk. In fact, I am not even going to be able to do all of those bits of the book, but I’ll do enough to give an overview of the kind thinking and argumentation Bostrom puts on display.

I don’t intend this series as a detailed critical analysis of Bostrom’s work. Instead, I’m using this series to get to grips with what Bostrom has to say. That doesn’t mean I’ll shy away from critical commentary — there will be plenty of that at times — but it does mean that criticising the arguments isn’t my primary focus; understanding them is.

Anyway, this post will serve as an index to future entries in the series. I'll add to it as I go along:



Friday, July 25, 2014

Does the Irish constitution imply the existence of unenumerated rights? (Part Three)




(Part One, Part Two)

This is the third and final part of my short series on unenumerated rights and the Irish Constitution. The series is examining a classic debate about the interpretation of Article 40.3 of the Irish Constitution in light of some important concepts from linguistic philosophy, specifically the concepts of implicature and enrichment. In part one, I explained what those concepts were. In part two, I looked at an argument from the philosopher Gerard Casey which claimed that Article 40.3 does not imply the existence of unenumerated rights.

In this part I’ll look at a response to Casey’s argument. This response builds upon the concepts and arguments discussed in the previous entries. Even though I do revisit some of those concepts and arguments below, I would still recommend reading the previous entries before reading this.


1. Casey’s Reading and the Substantive Response
To understand the response to Casey’s argument we need to briefly recap some of the key elements of his argument. As you recall, Article 40.3.1 of the Irish Constitution says that the state shall “defend and vindicate the personal rights of the citizen”. Article 40.3.2 then follows up by saying:

Article 40.3.2: The State shall, in particular, by its laws protect as best it may from unjust attack and, in the case of injustice done, vindicate the life, person, good name and property rights of every citizen.

The argument for the existence of unenumerated rights holds that because this section refers “in particular” to a group of rights (life, person, good name and property) not elsewhere discussed in Article 40, it must be a non-exhaustive list of personal rights and therefore citizens must have other personal rights that the state must vindicate and protect.

Casey argues that this is the wrong way to read Article 40.3.2. He maintains that the article does not refer to a bunch of personal rights; rather, it only refers to one set of rights: property rights (which are mentioned elsewhere in Article 40). This is because he views the following as the correct way in which to parse what is being said in Article 40.3.2:

Casey's Reading: The State shall, in particular…vindicate [the life, person, and good name of every citizen] and the [property rights] of every citizen.

We also saw the last day that Casey seems to have pretty good support for his reading from the Irish language version of the Irish Constitution.

Now, you might be inclined to view Casey’s argument as nothing more than a bit of linguistic trickery. His claim — that Article 40.3.2 only refers to property rights — might be correct as a matter of pure semantics, but when you think about in more detail, you might be persuaded that Article 40.3.2 must — as a matter of legal necessity — refer to other rights.

How might you be persuaded of this? Well, take the first bracketed-phrase from Casey’s reading. Then ask yourself: how could the state vindicate and protect those things except by creating a set of legal rights? How could someone’s life, for instance, be protected by the state, through that state’s constitution, without there being some sort of legally recognised and enforceable right to life? How could someone’s person be protected without some sort of set of personal rights (including the right to bodily integrity, which featured in the case Ryan v. Attorney General) be recognised and enforceable? And so on.

This is the substantive response to Casey’s argument. It holds that even if Article 40.3.2 doesn’t literally and explicitly refer to anything other than property rights it does as a matter of legal substance. This response isn’t perfect. After all, it is technically possible for a state to protect someone’s life and good name without creating a legal right to those things. Nevertheless, within the world of constitutional law, there is a pretty tight connection between the protection of those things and the creation of a legal right. This, incidentally, means that if the substantive response is to succeed it will succeed as a matter of pragmatics, not semantics — see part one for the distinction. It is an argument about what makes sense in a particular pragmatic context; it is not an argument about what sort of meaning is semantically encoded into the text.


2. Casey’s Reply and Concluding Thoughts
Is the substantive response any good? Casey recognises and replies to it in his article. His reply is interesting, though I’m not sure what to make of it. First of all, Casey concedes the main thrust of the substantive argument. He doesn’t kick up a fuss about the conceptual connection between the existence of rights and the state’s proclaimed duty to protect and vindicate things like the life, person and good name of the citizen. The only thing he does say is that if we accept this we must accept the further linguistic quirk that article 40.3.1 and 40.3.2 both refer to “personal rights”. The repetition is not fatal to the case for unenumerated rights, but it is odd.

Casey’s main objection to the substantive response is that it proves too much. One of the keys to Kenny J’s original argument for unenumerated rights was the claim that the right to life and the right to a good name were not specified elsewhere in Article 40.3, but if the substantive response is correct they are specified in Article 40.3. They are specified in Article 40.3.2. It’s true that they are not specified anywhere else, but as Casey points out, what difference should that make?

There is still the problem that Article 40.3.2 uses the phrase “in particular”, which suggests (pragmatically if not semantically) that the list of rights in 40.3.2 is non-exhaustive. But Casey thinks you can deal with this by supposing that the phrase “in particular” attaches to the words “vindicate” and “protect” not to the list of rights. As he himself puts it:

…if the substantive response is correct, these rights are specified in Article 40; they are specified precisely, if implicitly, in 40.3.2. They may not be specified elsewhere in Article 40 but why should that be problematic, just as the mention of personal rights in both sub-s. 1 and sub-s. 2, on this reading, would have to be unproblematic. In this context the phrase ‘in particular’ could attach to the verbs ‘protect’ and ‘vindicate’ and would commit the State to protect the (implied) rights in sub-s. 2o from unjust attack and to vindicate them in the case of injustice done, as distinct, perhaps, from other rights in Article 40, such as those mentioned in 40.6.

I find this a little unsatisfactory. Casey seems to be tying himself into knots in order to get us to accept his preferred reading. If he’s right, then we’d have to accept two linguistic oddities: (i) the repetition of “personal rights” in 40.3.1 and 40.3.2; and (ii) the attachment of “in particular” to “vindicate” and “protect” rather to the list of rights (odd given that article 40.3.1 already refers to the state's duty to "defend" and "vindicate" personal rights). If we are going by which interpretation commits us to the fewest anomalies, I would suggest that the substantive response is more appealing in that it only commits us to the first.

I would also add, as a concluding thought, that the substantive response definitely seems more plausible when we think about the argument from a pragmatic rather than a semantic viewpoint. The problem with Casey’s arguments is that they tend to elide the distinction between the two, starting out by making purely semantic points and staying with those once the argument has drifted into pragmatic territory (which it has by the time we get to the substantive response). I am inclined to agree with him as a matter of semantics: the Irish constitution does not semantically imply the existence of unenumerated rights. Indeed, the cancellability argument that I outlined in part two would seem to be nearly decisive on that score. But the question is really whether the constitution implies their existence as a matter of pragmatics (i.e. as a function of the legal and historical nature of the relevant provisions). To be fair to him, Casey may acknowledge this point when, at the end of his article, he accepts that certain personal rights may require constitutional recognition as a result of the “Christian and democratic” nature of the constitution. To evaluate that argument, however — and the pragmatic argument for unenumerated rights more generally — would require a far longer series of posts. So I’ll have to leave it there for now.

Tuesday, July 22, 2014

The Philosophy of Mind-Uploading (Series Index)



Looking back over my old posts, I suddenly realise that I've written quite on number on the philosophy of mind-uploading. Mind-uploading is a general term for the phenomenon whereby our minds are transferred out of our brains and into some other substrate. Some people claim that this may be possible. It is an intriguing claim and it raises a number of philosophical issues, particularly relating to the nature of personal identity.

Anyway, I thought it might be useful to provide a convenient index to everything I've written on the topic. I may add to this series in the future.
















Does the Irish constitution imply the existence of unenumerated rights? (Part Two)



(Part One)

This the second part of my short series on unenumerated rights in the Irish constitution. The series is looking a classic debate about the interpretation of Article 40.3 of the Irish constitution. It does so in light of some important concepts from linguistic philosophy, particularly the concepts of implicature and enrichment. I gave an overview of those concepts in part one.

In this part, I look at Mr Justice Kenny’s argument in favour of the existence of unenumerated rights. I also look at the philosopher Gerard Casey’s reconstruction and rebuttal of that argument. In doing so, I will be working primarily off Casey’s article “The ‘Logically Faultless’ Argument for Unenumerated Rights in the Constitution”.

If you haven’t read part one, I would recommend doing so. In what follows, I will be relying on some of the concepts and tests that are explained in that post.


1. Mr Justice Kenny’s Argument from Ryan v. Attorney General
Ryan v. Attorney General is a famous Irish constitutional law case. It involved a woman objecting to the fluoridation of the public water supply on the grounds that it violated her right to bodily integrity. The problem was that no such right is mentioned in the Irish constitution. In his judgment, however, Mr Justice Kenny found that the constitution — specifically Articles 40.3.1 and 40.3.2 — implied the existence of unenumerated rights, and that among those unenumerated rights was the right to bodily integrity. He did not, however, find in favour of Ryan, holding instead that fluoridation did not violate that right.

The right to bodily integrity is not what interests me. What interests me is the argument Kenny J. made for the existence of unenumerated rights. To understand that argument, we first need to review the wording of the relevant articles:

Article 40.3.1: The State guarantees in its laws to respect, and, as far as practicable, by its laws to defend and vindicate the personal rights of the citizen
Article 40.3.2: The State shall, in particular, by its laws protect as best it may from unjust attack and, in the case of injustice done, vindicate the life, person, good name and property rights of every citizen.

I’ve emphasised some of the key phrases since they are crucial to Kenny J’s argument. That argument — such as it is — is contained in the following extract from his judgment:

The words ‘in particular’ show that sub-s. 2 is a detailed statement of something which is already contained in the general guarantee. But sub-s. 2 refers to rights in connection with life and good name and there are no rights in connection with these two matters specified in Article 40. It follows, I think, that the general guarantee in sub-s. 1 must extend to rights not specified in Article 40.3.

So Kenny J thinks that Article 40.3.1 contains a general guarantee that the state will defend and vindicate the personal rights of the citizen. He then thinks that Article 40.3.2 provides a non-exhaustive list of some of those personal rights, and since none of those rights appear elsewhere in Article 40 (you’ll just have to take my word for this - unless you want to read the full text for yourself), it follows that the general guarantee covers unenumerated rights too.

The argument is a little odd, but in his analysis, Casey offers the following semi-formal reconstruction:


  • (1) Article 40.3.1 provides a general guarantee of the personal rights of the citizen.
  • (2) Article 40.3.2 by virtue of the words “in particular” provides a detailed specification of that general guarantee.
  • (3) But Article 40.3.2 refers specifically to rights in connection with life and good name and there are no such rights specified in Article 40.
  • (4) Therefore, the general guarantee in 40.3.1 must extend to rights not specified elsewhere in Article 40. (i.e. there are unenumerated rights)


The logical validity of this argument is open to doubt, but let’s grant that it is valid. It’s important to realise how premises (2) and (3) are crucial to the argument Kenny J is making. Initially, I thought it would be possible to make a case for unenumerated rights based solely on the wording of Articles 40.3.1 and 40.3.2. The idea being that in using the phrase “in particular”, Article 40.3.2 implies that the list being given is non-exhaustive and therefore that there must be other rights not specified in the article. But this is flawed, as we shall see in a moment. For Kenny J, it was the fact that Article 40.3.2 gave a non-exhaustive list combined with the fact that the rights listed there are not covered elsewhere in Article 40 that made the case for the existence of unenumerated rights.

Anyway, how plausible is this argument? In the remainder of this post, I will look at Casey’s rebuttal of the main premises.


2. Casey’s Critique of Premises 1 and 2
Casey doesn’t have much to say about premise (1), except that referring to the guarantee in Article 40.3.1 as “general” may prejudge the issue. Instead, we should simply say that it acknowledges that the state shall vindicate and defend a class of personal rights.

Premise (2) is more problematic. The claim made by Kenny J and his defenders is that the use of the phrase “in particular” implies the existence of other personal rights, i.e. it implies that the list being given is non-exhaustive. But does it really do so? Casey argues that it doesn’t. He bases his argument on the idea of conversational implication, but I find that argument unhelpful because it elides the distinction between semantics and pragmatics. So I’m going to substitute my own argument. It agrees with Casey’s basic conclusion, but hopefully provides a more compelling reason for doing so.

The argument draws on Marmor’s cancellability and negation tests, both of which were discussed in part one. The tests help us to determine whether an implication arises as a matter of semantics or as a matter of pragmatics. The idea is that if it is not possible to cancel an implication, then the implication is semantically-encoded into the text. If, on the other hand, it is possible to cancel an implication, then it arises as a matter of pragmatics (i.e. as a function of the specific context in which the text was produced).

We need to apply these tests to the text of article 40.3.2. To do this, we must first identify the relevant portion of 40.3.2 and specify the alleged implication. As follows:

“The State shall in particular… vindicate the life, person, good name and property rights of every citizen” → there are other unenumerated personal rights that the state must vindicate.

Now we must apply the negation test. Take the negation of the alleged implication:

There are no other personal rights that the state must vindicate.

And then pair that negation with the original wording. What do we then have? Well, we have the statement that the state will, in particular, vindicate a certain set of rights, and the claim that there are no other rights beyond that set. The question is whether this pair of statements involves a contradiction. The answer is that it doesn’t. It just involves an awkward turn of phrase. This means the alleged implication is cancellable in this instance, which in turn suggests that if Article 40.3.2 implies that the list of rights there specified is non-exhaustive, it does so pragmatically, not semantically. In other words, the existence of unenumerated rights is rendered possible but not necessary by the use of “in particular”.

An analogy will probably be helpful, and Casey supplies a good one. Consider the statement:

"John is an attractive fellow. He has, in particular, a friendly disposition and a generous spirit."

We might suppose that this form of words implies that John has other attractive qualities. But that implication would not pass the negation test. It is logically consistent to state that he has those qualities in particular, and that he has no other attractive qualities. It’s an awkward way of putting it, to be sure, but it’s not inconsistent. The same could be true in the case of Article 40.3.2.


3. Casey’s Critique of Premise (3)
The rebuttal of premise (2) makes a neat linguistic point, but it is hardly fatal. There could be other linguistic factors which decisively make the case for the existence of unenumerated rights. That’s exactly what premise (3) tries to do. By highlighting the discrepancy between the rights listed in 40.3.2 and those mentioned elsewhere, it tries to add further linguistic reasons for thinking that the list provided in 40.3.2 is non-exhaustive.

The problem, as Casey points out, is that premise (3) is false. Kenny J is convinced that 40.3.2 specifically mentions the right to life and good name, but it actually doesn’t. It only mentions property rights. To see the problem go back to the wording of 40.3.2:

Article 40.3.2: The State shall in particular… vindicate the life, person, good name and property rights of every citizen.

Kenny J seems to parse that in the following manner:

Kenny J's Reading: The State shall, in particular…vindicate the [right to life], [right to person], [right to a good name], and [property rights] of every citizen.

But that’s a very odd way of reading it, particularly since it seems to force us to accept the notion of a “right to person” (though maybe this could be read as simply repeating the general guarantee to respect the “personal” rights of the citizen). The correct reading, according to Casey, is as follows:

Casey's Reading: The State shall, in particular…vindicate [the life, person, and good name of every citizen] and the [property rights] of every citizen.

This reading makes clear that only property rights are referenced in Article 40.3.2. What’s more, property rights, unlike the right to life and good name, are explicitly covered by another subsection of Article 40 (and again later in Article 44). This defeats premise (3) of Kenny J’s argument.

Actually, there is an even more decisive reason for endorsing Casey’s reading. The Irish constitution is written in two languages: Irish and English. Whenever there is an inconsistency between the two, the Irish language version prevails. The Irish language version of 40.3.2 reads like this:

Article 40.3.2: Déanfaidh an Stát, go sonrach, lena dhlíthe, beatha agus pearsa agus dea-chlú agus maoinchearta an uile shaoránaigh a chosaint ar ionsaí éagórach chomh fada lena chumas, agus iad a shuíomh i gcás éagóra.

This does translate as being roughly equivalent to the English language version, but the way in which the bit after “in particular” is drafted in the Irish version is slightly clearer. I have highlighted the relevant portion of text. It reads literally as “life (beatha) and person (pearsa) and good name (dea-chlu) and property rights (maoinchearta)”. The use of “and” suggests that these are all distinct things, and the use of the composite “maoinchearta” for property rights is important. “Cearta” is the Irish for “right”. If it occurred at the end of the phrase as a separate word, we might have reason to prefer Kenny J’s reading. The fact that it is explicitly conjoined to the word for property, suggests that Casey’s is the correct reading.

For that reason, Casey holds that Kenny J’s argument for the existence of unenumerated rights is flawed. There are, however, some possible responses to Casey’s rebuttal. I’ll discuss those in part three.

Monday, July 21, 2014

Does the Irish constitution imply the existence of unenumerated rights? (Part One)




I haven’t done a post on legal theory in a while. This post is an attempt to rectify that. It’s going to look at the philosophy of legal interpretation. It does so by honing in on a very specific issue: the implied existence (or non-existence, as the case may be) of unenumerated rights in the Irish constitutional text. The issue arises because of the wording of Article 40.3 of the Irish constitution. The offending provisions are (in their English language versions, and with emphasis added):

Article 40.3.1: The State guarantees in its laws to respect, and, as far as practicable, by its laws to defend and vindicate the personal rights of the citizen.
Article 40.3.2: The State shall, in particular, by its laws protect as best it may from unjust attack and, in the case of injustice done, vindicate the life, person, good name and property rights of every citizen.

In an influential decision back in the 1960s, an Irish High Court judge (Kenny J) held that these two provisions, when read together, implied that the Irish constitution recognised and protected a class of unenumerated personal rights. His argument for this conclusion was described by John Kelly — a famous Irish constitutional scholar — as “logically faultless”. The courts then took the idea and ran with it, identifying a whole of raft of unenumerated rights in subsequent case law.

But was the argument really logically faultless? In a short piece, the philosopher Gerard Casey claimed that it was not. Far from it, in fact. He argued that if you read the two provisions correctly, Kenny J’s case for the existence of unenumerated rights is borderline absurd.

Over the next two posts, I want to take a look at this debate, focusing specifically on Kenny J’s original argument and Casey’s rebuttal. Along the way, I want to highlight some important concepts from linguistic philosophy and show how these concepts can help us to understand the arguments on display. I’ll start, today, with a basic primer on the concepts of implicature and enrichment. I’ll look at the actual arguments about the interpretation of Article 40.3 in subsequent posts.


1. Implicature and the Cancellability Test
To start, we must have some general sense of the distinction between semantics and pragmatics. This is a standard distinction in linguistic theory, but it is not widely-known. Semantics refers to the meaning of the words used in a particular utterance. That is to say: the meaning that is encoded into the linguistic signs and symbols used by the speaker. The semantic meaning of an utterance is general and not context dependent. Pragmatics, on the other hand, refers to the token-specific meaning of an utterance. That is to say: the meaning that is communicated by a given speaker, in a particular time and place. It is specific and highly context-dependent.

It is important to be aware of these distinctions because the debate about the interpretation of Article 40.3 sometimes slides back and forth between both domains.

Anyway, with that general distinction in mind, we can turn to the distinction between implicature and enrichment. Both concepts cover the ways in which the communicated meaning of an utterance can extend beyond the words used in the utterance, but they do so in different ways. We’ll start by looking at implicature.

Implicature covers the phenomenon whereby the words used in an utterance can imply something beyond what is said. For example, if I say “I am going to wash my car”, I imply that I own or have use of a car. Or, if I am a member of a criminal gang, and I say to a local business owner “It’s a nice place you got here, it would be a shame if something happened to it”, I imply that if the owner doesn’t pay us some protection money, we will destroy his place of business. Both cases involve implicature, but they are rather different in nature. In the first case, the implied ownership or use of the car is semantic (i.e. it is implied by the actual words used). In the second case, the implied threat is pragmatic (i.e. it is a feature of the particular context in which the words are used).

The legal philosopher Andrei Marmor has helpfully identified three classes of implicature, each of which can feature in different legal contexts

Conversational Implicature: This is probably the most widely discussed form of implicature. It was first identified by the linguistic philosopher Grice. It arises frequently in everyday conversations and its occurrence is linked to certain norms of everyday conversation. This kind of implicature is highly context-sensitive and is firmly within the pragmatic branch of analysis. If you’re interested, I wrote about the legal implications of this before.
Semantically-encoded implicature: This is a distinct form of implicature, one which should not to be confused with the conversational form. As Marmor puts it, it arises when “the speaker is committed to a certain content simply by virtue of the words she has uttered…regardless of the specific context of conversation”. This is not context-sensitive and belongs more properly within the semantic branch of analysis.
Utterance Presupposition: This is where an utterance presupposes something not explicitly mentioned or stated in the utterance itself. In other words, it arises when an utterance would not make sense without us presupposing some unmentioned entity, activity or state of affairs. Utterance presuppositions occupy a somewhat uncertain territory.

The reason for the uncertain position of utterance presuppositions has to do with the test that is used to determine whether implicature is semantically-encoded into the words of an utterance. This is the cancellability test. According to this test, an implicature is semantically-encoded if it is not possible for the speaker, using those words, to cancel the implied meaning. Conversely, if it is possible to cancel the meaning, then the implicature is conversational and contextual in nature. The problem with utterance presuppositions is that they are sometimes uncancellable, sometimes cancellable. Thus, depending on the form of words used, they can be either context-dependent or non-context dependent.

Marmor gives us some examples of this. Consider the following two utterances and their presuppositions:


  • (A) “It was Jane who broke the vase” → presupposes that someone broke the vase 



  • (B) “The Republicans and Senator Joe voted for the bill” → presupposes Joe is not a Republican



Marmor’s argument is that the presupposition in A is not cancellable, whereas the presupposition in B is. This argument is supported by a new test, the negation test. This is an add-on to the cancellability test. It works like this: if we add propositions A and B together with the negation of their presuppositions, do we get an outright contradiction, or do we just get an awkward but not inconsistent turn of phrase?

If we take “Jane broke the vase” and add it to “no one broke the vase”, we get a contradiction. This suggests that the presupposition in this instance is not cancellable and is hence semantically-encoded into the utterance. Contrast that with proposition B. If we add that together with “Joe is a Republican”, we don’t quite get a contradiction. We just get an awkward form of expression. This suggests that the presupposition is cancellable in this instance, though the context may dictate otherwise. So an utterance presupposition could be highly context-specific, but might also not be.

This is all pretty technical stuff, but if you can wrap your head around it, it really does help to make sense of some of the arguments about the meaning of Article 40.3. As we shall see, one of the crucial issues there is whether Article 40.3 implies the existence of unenumerated rights in a semantic or pragmatic way. And one way in which to test for this is to apply the cancellability test to the wording of the article.


2. Enrichment and the class of rights
Okay, so that’s everything we need to know about implicature. What about enrichment? This is less important for present purposes, so we can go over it in less detail. Enrichment is the phenomenon whereby the meaning of particular phrases is enriched by the pragmatic context in which they are uttered. It arises because in most everyday speech contexts we compress what we want to say into fewer words than are strictly needed. Most commonly, enrichment serves to restrict the class of objects or actions to which a given utterance is intended to refer.

Here’s an example. Suppose you and I are roommates moving into a new apartment together. We are busy installing our furniture and putting up pictures and ornaments. At one point, clutching a painting beneath my arm and eyeing an appropriate spot on the wall, I tell you: “I am going to use the hammer”. Presumably, what I mean in this context is that “I am going to use the hammer, to put a nail in the wall upon which I can hang this picture”. Note, however, that the original wording was, strictly speaking, vague as to the precise way in which the hammer was going to be used. A hammer could, in fact, be used in many ways (e.g. as a paper weight or as a weapon). But the restrictive meaning of “to use” was implied by the context in which the utterance was made and is part of the enriched meaning of what was said. The same enriched restriction of meaning frequently arises with noun classes.

Enrichment can have an important role to play in debates about the meaning of a legal text. Indeed, the vagueness of the verb “to use” has occasionally caused headaches for courts. Suppose there is a statute saying that if you “use” a firearm during the commission of a drugs offence, you add five years to the jail sentence. Now, imagine that you and I are involved in a drug deal. I am selling you cocaine. I do so in my office, where there is a firearm resting on a stack of papers on my desk throughout the sale. Have I used the firearm during the drugs offence? That depends on whether the verb “to use” has a restricted enriched meaning or not. (This is based on a real case, but I can’t remember the name or the outcome right now).

Enrichment also has a role to play in the interpretation of unenumerated rights provisions. Assume, for sake of argument, that the Irish constitution does imply the recognition and protection of a class of unenumerated personal rights. Is that an unrestricted or restricted class of rights? Some irish judges have suggested that the class is restricted by the “Christian and democratic” nature of the Irish state, suggesting a degree of enriched meaning. This may nor may not be plausible. Similar arguments have been made about the class of unenumerated rights recognised by the US constitution. Randy Barnett — a prominent libertarian legal scholar — has argued that the unenumerated rights clause in the 9th Amendment is restricted to “liberty rights”. He does on the grounds that, in the pragmatic context in which the US constitution was drafted and ratified, that restricted meaning would have understood. It would be interesting to see whether analogous arguments could be made in the Irish context about the "Christian and democratic" nature of the rights. I won’t, however, pursue the matter any further in this series of posts.

So that’s it for part one. Hopefully this conceptual overview has been somewhat illuminating. We’ll look at the actual arguments about Article 40.3 in part two.