Sunday, January 19, 2014

Big Data, Predictive Algorithms and the Virtues of Transparency (Part Two)

(Part One)

This is the second part in a short series of posts on predictive algorithms and the virtues of transparency. The series is working off some ideas in Tal Zarsky’s article “Transparent Predictions”. The series is written against the backdrop of the increasingly widespread use of data-mining and predictive algorithms and the concerns this has raised.

Transparency is one alleged “solution” to these concerns. But why is transparency deemed to be virtuous in this context? Zarksy’s article suggests four possible rationales for transparency. This series is reviewing all four. Part one reviewed the first, according to which transparency was virtuous because it helped to promote fair and unbiased policy-making. This part will review the remaining three.

To fully understand the discussion, one important idea from part one must be kept in mind. As you’ll recall, in part one it was suggested that the predictive process — i.e. the process whereby data is mined to generate some kind of prediction — can be divided into three stages: (i) a collection stage, in which data points/sets are collated and warehoused; (ii) an analytical stage, in which the data are mined and used to generate a prediction; and (iii) a usage stage, in which the prediction generated is put to some practical use. Transparency could be relevant to all three stages or only one, depending on the rationale we adopt. This is something emphasised below.

1. Transparency as a means of promoting innovation and crowdsourcing
The use of predictive algorithms is usually motivated by some goal or objective. To go back to the example from part one, the IRS uses predictive algorithms in order to better identify potential tax cheats. The NSA (or secret service agencies more generally) do something similar in order to better identify threats to national security. Although secrecy typically reigns supreme in such organisations, there is an argument to be made that greater transparency — provided it is of the right kind — could actually help them to achieve their goals.

This, at least, is the claim lying behind the second rationale for transparency. Those who are familiar with the literature on epistmeic theories of democracy will be familiar with the basic idea. The problem with small, closed groups of people making decisions is that they must rely on a limited pool of expertise, afflicted by the biases and cognitive shortcomings of its members. Drawing from a larger pool of expertise, and a more diverse set of perspectives, can often improve decision-making. The wisdom of crowds and all that jazz.

Transparency is something that can facilitate this. In internet-speak, what happens is called “crowdsourcing”: a company or institution obtains necessary ideas, goods and services from a broad, undefined, group of online contributors. These contributors are better able to provide the ideas, goods and services than “in-house” employees. Or so the idea goes. One can imagine this happening in relation to predictive algorithms as well. An agency has a problem: it needs to identify potential tax cheats. It posts details about the algorithm they plan to use to an online community; solicits feedback and expertise from this community; and thereby improves the accuracy of the algorithm.

That gives us the following argument for transparency:

  • (4) We ought to ensure that our predictive protocols and policies are accurate (i.e. capable of achieving their underlying objectives).
  • (5) Transparency can facilitate this through the mechanism of crowdsourcing.
  • (6) Therefore, we ought to incorporate transparency into our predictive processes and policies.

Three things need to said about this argument. First, the stage at which transparency is most conducive to crowdsourcing is the analytical stage. In other words, details about the actual mechanics of the data-mining process are what need to be shared in order to take advantage of the crowdsourcing effect. This is interesting because the previous rationale for transparency was less concerned with this stage of the process.

Second, this argument assumes that there is a sufficient pool of expertise outside the initiating agency or corporation. Is this assumption warranted? Presumably it is. Presumably, there are many technical experts who do not work for governmental agencies (or corporations) who could nevertheless help to improve the accuracy of the predictions.

Third, this argument is vulnerable to an obvious counterargument. Sharing secrets about the predictive process might be valuable if everybody shares the underlying goal of the algorithm. But this isn’t always going to be true. When it isn’t, transparency runs the risk of facilitating sabotage. For instance, many people don’t want to pay their taxes. It is entirely plausible to think that some such people would have the technical expertise needed to try to sabotage a predictive algorithm (though, of course, others who support the goal may be able to detect and resolve the sabotage). It is also possible that people who don’t share the goals of the algorithm will use shared information to “game the system”, i.e. avoid the scrutiny of the programme. This might be a good thing or a bad thing. It all depends on whether the objective or goal of the algorithm is itself virtuous. If the goal is virtuous, then we should presumably try to minimise the opportunity for sabotage.

Zarsky suggests that a limited form of transparency, to a trusted pool of experts, could address this problem. This seems pretty banal to me. Indeed, it is already being done, and not always with great success. After all, wasn’t Edward Snowden a (trusted/vetted?) contractor with the NSA? (Just to be clear: I’m not saying that what Snowden did was wrong or undesirable; I’m just saying that, relative to argument being made by Zarsky, he is an interesting case study).

2. Transparency as a means of promoting privacy
Transparency and privacy are, from one perspective, antagonistic: one’s privacy cannot be protected if one’s personal information is widely shared with others. But from another perspective, privacy can provide an argument in favour of transparency. Predictive algorithms rely on the collection and analysis of personal information. Privacy rights demand that people have some level of control over their personal information. So, arguably, people have a right to know when their data is being used against them, in order facilitate their control.

We can call this the “notice argument” for transparency:

  • (7) We ought to protect the privacy rights of those affected by our predictive policies and protocols.
  • (8) Transparency helps us to do this by putting people on “notice” as to when their personal data is being mined.
  • (9) Therefore, we ought to incorporate transparency into our predictive policies and protocols.

The kind of transparency envisaged here may, at first glance, seem to be pretty restrictive: only those actually affected by the algorithm have the right to know. But as Zarsky points out, since these kinds of algorithms can affect pretty much anyone, and since the same generic kinds of information are being collected, it could end up mandating a very broad type of transparency.

That would be fine if the argument weren’t fatally flawed. The problem is that premise (9) is false. There is no reason to think that putting people on notice as to when their personal data is being mined will help to protect their privacy rights. Notice without control (i.e. without the right to say “stop using my information”) would not protect privacy. That’s not to say that notice is devoid of value in this context. Notice could be very valuable if people are granted the relevant control. But this means that transparency is, at best, part of the solution, not the solution in and of itself.

In any event, although the control model of privacy is popular in some jurisdictions (Zarsky mentions the EU in particular), there are reasons for thinking that it is becoming less significant. As Zarsky notes, it seems to be losing ground in light of technological and social changes: people are increasingly willing to cede control over personal data to third parties.

Of course, this might just be because they don’t realise or fully appreciate the ramifications of doing so. Perhaps then there needs to be a robust set of privacy rights to protect people from the undue harm they may be doing themselves. Zarsky worries that this is too paternalistic: if people want to cede control, who are we to say that we know better? Furthermore, he thinks there is another rationale for transparency that can address the kinds of concerns implicit in this response. That is the last of our four rationales.

3. Transparency as a means of respecting autonomy
The last rationale has to do with respecting individual autonomy. It turns on certain moral principles that are fundamental to liberal political theory, and which I explored in more detail in my earlier post on the threat of algocracy, and in my article on democratic legitimacy. Zarsky, whose foundational principles seem more rooted in the US Constitution, expresses this rationale in terms of due process principles and the Fourth Amendment. I’ll stick with my own preferred vocabulary. The end point is pretty much the same.

Anyway, the idea underlying this rationale is that if predictive algorithms are going to have a tangible effect on an individual’s life, then that individual has a right to know why. The reasoning here is general: any procedure that results in a coercive measure being brought to bear on a particular individual needs to be legitimate; legitimacy depends on the instrumental and/or intrinsic properties of the procedure (e.g. does it result in substantively just outcomes? does it allow the affected party to participate? etc.); one of the key intrinsic properties of a legitimate procedure is its comprehensibility (i.e. does it explain itself to the individual affected); transparency, it is argued, facilitates comprehensibility.

To put this in more formal terms:

  • (10) We ought to ensure that our predictive protocols and policies are legitimate.
  • (11) One of the essential ingredients of legitimacy is comprehensibility.
  • (12) Transparency facilitates comprehensibility.
  • (13) Therefore, we ought to incorporate transparency into our predictive protocols and policies.

It is possible to argue against the normative principles in this argument, particularly the claim that comprehensibility is essential to legitimacy. But we’ll set those criticisms to the side for now. The main focus is on premise (12). As it happens, I have already written a post which pinpointed lack of comprehensibility as a key concern with the increasing use of predictive algorithms.

Zarsky seems less concerned in his discussion. But he does highlight the fact that disclosure at the collection stage may not necessarily facilitate comprehensibility. Knowing which of your personal data has been selected for inclusion in the relevant dataset might give you some idea about how the process works, but it won’t necessarily tell you why you were targetted by the algorithms. Disclosure at the analytical and usage stages will be needed for this. And the complexity of the underlying technology may be a real problem here.

Zarsky responds by claiming that interpretable analytical processes are needed so that “individuals can obtain sufficient insight into the process and how it relates to their lives”. But I suspect there are serious problems with this suggestion. Although interpretable processes might be desirable desirable, the competing pressures that drive people toward more complex, less comprehensible (but hopefully more accurate) processes may be too great to make a demand for interpretability politically and socially feasible.

Zarsky also argues that the kind of transparency required by this rationale may not just be limited to those affected by the process. On the contrary, he suggests that broad disclosure, to the general population, may be desirable. This is because a more fulsome form of disclosure can combat the risks associated with partial and biased disclosures. It is important to combat those risks because, once information is disclosed to one individual, it is likely that they will leak out into the public sphere anyway.

That said, full disclosure could pose additional risks for the individuals or groups who are targetted by the algorithms, perhaps opening them up to social discrimination and stigmatisation. This highlights the potential vices of transparency, three of which are discussed in the final sections of Zarsky’s article. With luck, I’ll talk about them at a later date.

4. Conclusion
To sum up, transparency is often uncritically accepted as a virtue of the internet age. The last two posts have asked “why?”. Tying the discussion specifically to the role of transparency in the use of predictive algorithms, they have explored four different rationales for transparency. The first claiming that transparency could facilitate just and unbiased decision-making; the second claiming that it could facilitate crowdsourcing and innovation; the third claiming that it could help protect privacy rights; and the fourth claiming that it could help respect autonomy.

Although I have drawn attention to some criticisms of these rationales, a more detailed appraisal of the vices of transparency is required before we can make a decision in its favour. That, sadly, is a task for another day.

No comments:

Post a Comment