Sunday, July 26, 2015

How to Study Algorithms: Challenges and Methods

(Series Index)

Algorithms are important. They lie at the heart of modern data-gathering and analysing networks, and they are fueling advances in AI and robotics. On a conceptual level, algorithms are straightforward and easy to understand — they are step-by-step instructions for taking an input and converting it into an output — but on a practical level they can be quite complex. One reason for this is the two translation problems inherent to the process of algorithm construction. The first problem is converting a task into a series of defined, logical steps; the second problem is converting that series of logical steps into computer code. This process is value-laden, open to bias and human error, and the ultimate consequences can be philosophically significant. I explained all these issues in a recent post.

Granting that algorithms are important, it seems obvious that they should be subjected to greater critical scrutiny, particularly among social scientists who are keen to understand their societal impact. But how can you go about doing this? Rob Kitchin’s article ‘Thinking critically about and researching algorithms’ provides a useful guide. He outlines four challenges facing anyone who wishes to research algorithms, and six methods for doing so. In this post, I wish to share these challenges and methods.

Nothing I say in this post is particularly ground-breaking. I am simply summarising the details of Kitchin’s article. I will, however, try to collate everything into a handy diagram at the end of the post. This might prove to be a useful cognitive aid for people who are interested in this topic.

1. Four Challenges in Algorithm Research
Let’s start by looking at the challenges. As I just mentioned, on a conceptual level algorithms are straightforward. They are logical and ordered recipes for producing outputs. They are, in principle, capable of being completely understood. But in practice this is not true. There are several reasons for this, some are legal/cultural, some are technical. Each of them constitutes an obstacle that the researcher must either avoid or, at least, be aware of.

Kitchin mentions four obstacles in particular. They are:

A. Algorithms can be black-boxed: Algorithms are oftentimes proprietary constructs. They are owned and created by companies and governments, and their precise mechanisms are often hidden from the outside world. They are consequently said to exist in a ‘black box’. We get to see their effects on the real world (what comes out of the box), but not their inner workings (what’s inside the box). The justification for this black-boxing varies, sometimes it is purely about protecting the property rights of the creators, other times it is about ensuring the continued effectiveness of the system. Thus, for example, Google are always concerned that if they reveal exactly how their Pagerank algorithm works, people will start to ‘game the system’, which will undermine its effectiveness. Frank Pasquale wrote an entire book about this black-boxing phenomenon, if you want to learn more.

B. Algorithms are heterogeneous and contextually embedded: An individual could construct a simple algorithm, from scratch, to perform a single task. In such a case, the resultant algorithm might be readily decomposable and understandable. In reality, most of the interesting and socially significant algorithms are not produced by one individual or created ‘from scratch’. They are, rather, created by large teams, assembled out of pre-existing protocols and patchworks of code, and embedded in entire networks of algorithms. The result is an algorithmic system, that is much harder to decompose and understand.

C. Algorithms are ontogenetic and performative: In addition to being contextually embedded, contemporary algorithms are also typically ontogenetic. This is a somewhat jargonistic term, deriving from biology. All it means is that algorithms are not static and unchanging. Once they are released into the world, they are often modified or adapted. Programmers study user-interactions and update code in response. They often experiment with multiple versions of an algorithm to see which one works best. And, what’s more, some algorithms are capable of learning and adapting themselves. This dynamic and developmental quality means that algorithms are difficult to study and research. The system you study at one moment in time may not be the same as the system in place at a later moment in time.

D. Algorithms are out of control: Once they start being used, algorithms often develop and change in uncontrollable ways. The most obvious way for this to happen is if algorithms have unexpected consequences or if they are used by people in unexpected ways. This creates a challenge for the researcher insofar as generalisations about the future uses or effects of an algorithm can be difficult to make if one cannot extrapolate meaningfully from past uses and effects.

These four obstacles often compound one another, creating more challenges for the researcher.

2. Six Methods of Algorithm Research
Granting that there are challenges, the social and technical importance of algorithms is, nevertheless, such that research is needed. How can the researcher go about understanding the complex and contextual nature of algorithm-construction and usage? It is highly unlikely that a single research method will do the trick. A combination of methods may be required.

Kitchin identifies six possible methods in his article, each of which has its advantages and disadvantages. I’ll briefly describe these in what follows:

1. Examining Pseudo-Code and Source Code: The first method is the most obvious. It is to study the code from which the algorithm was constructed. As noted in my earlier post there are two bits to this. First, there is the ‘pseudo-code’ which is a formalised set of human language rules into which the task is translated (pseudocode follows some of the conventions of programming languages but is intended for human reading). Second, there is the ‘source-code’, which is the computer language into which the human language ruleset is translated. Studying both can help the researcher understand how the algorithm works. Kitchin mentions three more specific variations on this research method:
1.1 Deconstruction: Where you simply read through the code and associated documentation to figure out how the algorithm works.
1.2 Genealogical Mapping: Where you ‘map out a genealogy of how an algorithm mutates and evolves over time as it is tweaked and rewritten across different versions of code’ (Kitchin 2014). This is important where the algorithm is dynamic and contextually embedded.
1.3 Comparative Analysis: Where you see how the same basic task can be translated into different programming languages and implemented across a range of operating systems. This can often reveal subtle and unanticipated variations.
There are problems with these methods: code is often messy and requires a great deal of work to interpret; the researcher will need some technical expertise; and focusing solely on the code means that some of the contextual aspects of algorithm construction and usage are missed.

2. Reflexively Producing Code: The second method involves sitting down and figuring out how you might convert a task into code yourself. Kitchin calls this ‘auto-ethnography’, which sounds apt. Such auto-ethnographies can be more or less useful. Ideally, the researcher should critically reflect on the process of converting a task into a ruleset and a computer language, and think about the various social, legal and technical frameworks that shape how they go about doing this. There are obvious limitations to all this. The process is inherently subjective and prone to individual biases and shortcomings. But it can nicely complement other research methods.

3. Reverse-engineering: The third method requires some explanation. As mentioned above, one of the obstacles facing the researcher is that many algorithms are ‘black-boxed’. This means that, in order to figure out how the algorithm works, you will need to reverse engineer what is going on inside the black box. You need to study the inputs and outputs of the algorithm, and perhaps experiment with different inputs. People often do this with Google’s Pagerank, usually in an effort to get their own webpages higher up the list of search results. This method is also, obviously, limited in that it provides incomplete and imperfect knowledge of how the algorithm works.

4. Interviews and Ethnographies of Coding Teams: The fourth method helps to correct for the lack of contextualisation inherent in some of the preceding methods. It involves interviewing or carefully observing coding teams (in the style of a cultural anthropologist) as they go about constructing an algorithm. These methods help the researcher to identify the motivations behind the construction, and some of the social and cultural forces that shaped the engineering decisions. Gaining access to such coding teams may be a problem, though Kitchin notes one researcher, Takhteyev, who conducted a study while he was himself part of an open-source coding team.

5. Unpacking the full socio-technical assemblage: The fifth method is described, again, in somewhat jargonistic terms. The ‘socio-technical assemblage’ is the full set of legal, economic, institutional, technological, bureaucratic, political (etc) forces that shape the process of algorithm construction. Interviews and ethnographies of coding teams can help us to understand some of these forces, but much more is required if we hope to fully ‘unpack’ them (though, of course, we can probably never fully understand a phenomenon). Kitchin suggests that studies of corporate reports, legal frameworks, government policy documents, financing, biographies of key power players and the like are needed to facilitate this kind of research.

6. Studying the effects of algorithms in the real world: The sixth method is another obvious one. Instead of focusing entirely on how the algorithm is produced, and the forces affecting its production, you also need to study its effects in the real world. How does it impact upon the users? What are its unanticipated consequences? There are a variety of research methods that could facilitate this kind of study. User experiments, user interviews and user ethnographies would be one possibility. Good studies of this sort should focus on how algorithms change user behaviour, and also how users might resist or subvert the intended functioning of algorithms (e.g. how users try to ‘game’ Google’s Pagerank system).

Again, no one method is likely to be sufficient. Combinations will be needed. But in these cases one is always reminded of the old story about the blind men and the elephant. Each is touching a different part, but they are all studying the same underlying phenomenon.

No comments:

Post a Comment