Thursday, March 6, 2014

Big Data and the Vices of Transparency

(Previous Posts)

Data-mining algorithms are increasingly being used to monitor and enforce governmental policies. For example, they are being used to shortlist people for tax auditing by the revenue services in several countries. They are also used by businesses to identify and target potential customers. Thanks to some high profile cases, there is now increasing concern about how their usage. Should they be restricted? Should they be used more often? Should we be concerned about their emerging omnipresence?

In an earlier set of posts, I looked at the case for transparency in relation to the use of such algorithms. Transparency advocates claim that full or partial disclosure of the methods for collecting and processing our data would be virtuous in any number of ways. For example, there are those who claim that it would promote innovation and efficiency, increase fairness, protect privacy and respect autonomy. I analysed their arguments at some length in that earlier set of posts.

Today, I want to look at the flip-side of the transparency debate. I want to consider arguments for thinking that transparency would actually be a bad thing. I look at two such arguments below. The first argument claims that transparency is bad because it thwarts legitimate government aims; the second claims that transparency is bad because it leads to increased levels of social stigmatisation and prejudice.

In writing this piece, I draw once more from Tal Zarsky’s article “Transparent Predictions”. This post is very much a companion to my earlier ones on the virtues of transparency and should be read in conjunction with them.

1. Would Transparency Thwart Legitimate Government Aims?
A simple argument for the vice of transparency holds that it would undermine legitimate government aims. Governments can use data-mining algorithms to assist them across a range of policy areas. I have already given the example of tax auditing and the attendant prevention of tax evasion. Similar examples could include combatting terrorism, enforcing aspects of criminal law, and predicting recidivism rates among convicted offenders so as to make rational parole decisions. If transparency prevented the government from doing those things, it might be lamentable.

In other words, it might be possible to make the following (abstract) type of argument:

  • (1) It is a good thing that the government pursues certain legitimate aims X1…Xn through the use of data-mining algorithms.
  • (2) Transparency would undermine the pursuit of those legitimate ends.
  • (3) Therefore, transparency would be a bad thing.

The wording of premise (1) is very important. It assumes that the government aims are legitimate, i.e. morally commendable, acceptable to rational citizens, optimal and so forth. If, for any given use of data-mining, you think the government aim is not legitimate, or that it is completely trumped by other, more important aims, then it is unlikely that you’ll be willing to entertain this argument. If, on the other hand, you think there is some degree of legitimacy to the particular government aims, or that these aims are not completely trumped but rather must be weighed carefully against other legitimate aims, then the argument could have some force. For if that’s the case you should be willing to weigh the benefits of transparency against the possible costs in order to reach a nuanced verdict about its overall desirability.

Anyway, this is just a way of saying that premise (1) is essential to the argument. Unfortunately, in this discussion, I’m not going to consider it all that closely. Instead, my focus is on premise (2). The key thing with this premise is the proposed mechanism by which transparency undermines the legitimate aims. Obviously, the details in any particular case will be fact-specific. Nevertheless, we can point to some general mechanisms that might be at play. Perhaps the most commonly-cited one is something we can call the “gaming the system”-mechanism. According to this, the big problem with transparency is that it will disclose to people the information they need in order to avoid detection by the algorithm, thereby enabling them to engage in all manner of nefarious activities.

A simple example, which has nothing to do with data-mining (at least not in the colloquially-understood sense) might help to illustrate the point. The classic polygraph lie detector may have had some ability to determine when someone was lying (however minimal). But once people were made aware of the theoretical and practical basis for the test they could avoid its detection by deploying a range of countermeasures. These are things like breathing techniques and muscle clenches that confound the results of the test. Thus, by knowing more about the nature of the test, people who really did have something to hide could avoid getting caught out by it. The concern is that something similar could happen if we disclosed all information relevant to a particular data-mining algorithm: potential terrorists, violent criminals and tax evaders (among others) could simply use the information to avoid detection.

How credible is this worry? As Zarsky notes, you have to consider how it might play out at each stage in the data-mining and prediction process. You start with the collection phase, where transparency would demand that details about the datasets used by the governmental algorithms be disclosed. These details might allow people to game the system, provided the datasets are sufficiently small and comprehensible. But if they are vast, there might be little scope for an individual to game the system. Similarly, release of the source code of the algorithm used at the processing stage would be valuable to a limited pool of individuals with the relevant technical expertise. Zarsky argues that the release of data about the proxies used by governments to identify potential suspects (or whatever) are likely to be the most useful to those want to game the system, but these proxies could fall into at least three different categories:

Other Illegal Acts: One thing that is often used by governmental agencies to predict certain kinds of wrongdoing is other illegal acts. Tarsky uses the example of being used as a proxy for . Now, we might want to prevent that type of wrongdoing anyway, so disclosure of this detail could have some positive effects (as that type of behaviour would be further disincentivised), but one could also imagine a potential terrorist capitalising on the disclosure of this information in a negative way. They will now know that they need to avoid the lesser type of wrongdoing in order to engage in the greater type of wrongdoing. That would be bad. Wouldn’t it?
Neutral Conduct: It could be that the proxies used are not other forms of wrongdoing but are instead completely neutral or positive behaviours (e.g. charitable donations could be an indicator of tax evasion). In other words, the proxies might not themselves be constitutive of bad behaviour, but they might be found to correlate with it. Disclosure of that kind of information would also seem to have negative implications for legitimate government aims. It would allow the nefarious people to game the system by avoiding those behaviours and may also encourage otherwise law-abiding people to avoid positive behaviours for fear of triggering the algorithm.
Immutable Character Traits: Another possibility is that the proxies cover immutable social or biological traits that correlate with wrongdoing. Disclosure of these proxies might not help people to game the system (assuming the traits are genuinely immutable) but they might have other deleterious effects.

This last example opens up the possible link between transparency and stereotyping. We’ll deal with this as a separate argument.

2. Would Transparency Increase Negative Stereotyping?
Another possible argument against transparency has to do with its potential role in perpetuating or generating new forms of social stereotype and prejudice. The argument is straightforward:

  • (4) It is bad to increase social prejudice and stereotyping.
  • (5) Transparency of the details associated with data-mining algorithms could increase social prejudice and stereotyping.
  • (6) Therefore, transparency of the details associated with data-mining algorithms is bad.

The value-laden terminology is important in understanding premise (4). You might object that certain forms of prejudice or stereotyping are morally justified if they accurately reflect the moral facts. For example, I have no great problem with there being some degree of prejudice against racists or homophobes (though I wouldn’t necessarily like that to manifest itself in extreme mistreatment of those groups). The assumption in this argument, however, is that prejudice and stereotyping will tend to have serious negative implications. Hence those two terms are to be read in a (negative) value-laden manner.

The key premise then, of course, is premise (5). Zarsky looks at a number of factors that speak in its favour, many of them resting on pre-existing weaknesses in human psychology. His main observation is that humans are not well-equipped to understand complex statistical inferences and so, when details of such inferences are disclosed to them, they will tend to fall back on error-laden heuristics when trying to interpret the information.

This can manifest itself in a variety of ways. Zarsky mentions two. The first is that people may fail to appreciate the domain-specificity of certain statistical inferences. Thus, if the algorithm says that law professors who write about transparency in particular settings (the example is Zarsky’s) are more likely to evade tax, the general population may think that this makes such law professors more likely to commit a whole range of crimes across a whole range of settings. The second way in which the problem could manifest itself is in people drawing conclusions about individual character traits from data that simply has to do with general correlations. Thus, for example, if an algorithm says that people from New York are more likely to evade tax, others might interpret this to be a fixed character trait of particular people from New York.

Both of these things could increase negative forms of social prejudice and stereotyping. And it is important to realise that these increases may not simply be in relation to classically oppressed and stereotyped groups (e.g. ethnic minorities), but may also be in relation to wholly novel groups. For instance, data-mining algorithms might (for all we know) find that hipsters are more likely to evade tax, or that people whose names end in “Y” are more likely to be terrorists. Thus, we might succeed in identifying new groups as the objects of our negative social judgments. This could be particularly problematic for them insofar as these new groups may have fewer well-established organisations dedicated to defending their interests.

Assuming the stereotyping and prejudice problem is real, how might it be solved? Increased opacity and reduced transparency is indeed one solution, but it is not the only one. As Zarsky points out, increased public education about the nature of statistical inferences, and the psychological biases of human beings, might also serve to reduce the problem. Arguably, this might be more the more ideal solution, if we accept that transparency has certain other benefits. Still, part of me thinks that the cost and effort involved would make it unattractive to many governments. Opacity may, alas, be the easier option for them.

No comments:

Post a Comment