Philosophical Disquisitions: The Moral Problem of Grading: An Extended Analysis

[This, admittedly quite long, post is a sample chapter from a book I may end up writing about the ethics of academia. I'm interested in feedback on it. Would people be interested in an entire book examining the moral dilemmas faced by the typical academic? Is this analysis of grading any good? Let me know]

Grading is the bane of most academics’ lives. Several times a year the working academic will be required to grade the students in their classes. Academics often complain about this process — begrudging both the time it takes and the mind-numbing nature of the task* — but rarely think about its ethics. Most see it as an inevitable and essential part of their jobs. If they didn’t grade students’ exams and assignments then what would be the point of all that teaching? It seems so obvious that grading is the natural denouement of teaching. It’s always been done and if it wasn’t done it would be weird. Students would complain and the general public would start to wonder what people are doing in universities. So, instead of subjecting the practice to close ethical scrutiny, most academics prefer to view it with ironic detachment. They laugh about it and then they get on with it.

A famous illustration of this ironic detachment is Daniel Solove’s article about the ’Staircase Method’ of grading. Solove, a law professor at George Washington University, first wrote about the method in the lead up to ‘marking season’ at his university. He realised that many of his colleagues would soon be sharing the pain of grading stacks of 100+ papers. He thought he could lighten their load by offering a new method of grading. Instead of sitting down and actually reading through all those exam scripts and assignments, why not take the stack to the top of the nearest staircase and give it a good heave over the edge. If you do it right, the papers should fall to the ground at different points along the staircase. You can then assign letter grades to the papers depending on where they fall down. This, of course, throws up something of a dilemma: what kind of grading rubric should you use? Should the papers that land closest to you (near the top step) get the highest grades and the ones that land furthest away get the lowest grades? Or should it be the other way around? Solove assures us that there is an obvious answer to this question:

“While many professors still practice the top-higher-grade approach, the leading authorities subscribe to the bottom-higher-grade theory, despite its counterintuitive appearance. The rationale for this view is that the exams that fall lower on the staircase have more heft and have traveled farther. The greater distance traveled indicates greater knowledge of the subject matter. The bottom higher-grade approach is clearly the most logical and best-justified approach.”

Solove also defends the practice from its critics, pointing out that most grading systems are riven with subjective biases and inconsistencies. The staircase method is perfectly objective. Every professor can view the same distribution of papers along the staircase and agree on the grade (though there are some tricky cases such as when a paper hangs over the edge of a step). Furthermore, it is not a purely arbitrary system. It takes some skill to throw the papers in the right way:

“The key to this method is a good toss. Without a good toss, it is difficult to get a good spread for the grading curve. It is also important to get the toss correct on the first try. Exams can get crumpled if tossed too much. They begin to look as though the professor actually read them, and this is definitely to be avoided. Additional tosses are also inefficient and expend needless time and energy”

I have discussed the staircase method with my colleagues many times over the years. Most of them find it amusing. Recently, I asked some of them if they thought it would be ethically appropriate for me to actually use it when grading my student assignments. They were aghast. It most certainly would not, I was told. But why not? What if it turned out that for all their sophistication and effort, the methods that we actually use for grading students are as unethical as Solove’s tongue-in-cheek method?

The remainder of this article will try to answer this question. In the end, I will conclude that the ethical academic should be opposed to most of our current grading practices, but that they still need to grade students anyway. They just need to be more transparent and open with students about the limitations of what they are doing.

1. What is the moral function of grading?

In order for grading to be morally justified, one of two things must be the case: (i) grading must serve some morally legitimate purpose and/or (ii) there must be some moral duty to grade that arises from the nature of the teacher-student relationship. The former, if it were true, would provide grading with a consequentialist moral justification; the latter, if it were true, would provide grading with a deontological moral justification. Later in this chapter, I will consider the idea that there might be moral duty to grade in a little more detail. For now, I want to focus on the idea that grading serves some morally legitimate purpose. I do so because when I informally polled my colleagues on what they thought the moral justification for grading might be, they typically cited the purposes of grading in their replies.

So what are the moral purposes of grading? There are four that are worth mentioning. The first, and perhaps most obvious, is that grades motivate students to learn. Learning, we assume, is a good thing and so anything that encourages it is, all else being equal, also a good thing. So if it is true that grades motivate students to learn, they can be morally justified because they contribute to the good of learning. Another way of putting this is that grading provides an incentive to learning through a simple punishment/reward mechanism. Low grades punish bad learning and high grades reward good learning. Students will have an aversion to low grades and an attraction to high grades. By dishing out the rewards and punishments appropriately, we can shift student behaviour towards the good of learning. Just like we change the behaviour of rats in a cage.

The second potential moral purpose of grades is that grades play an important role in the allocation of distributive goods. Although this is contentious, some people argue that modern society functions (or at least should function) largely as a meritocracy. What this means is that you are (or should be) rewarded on the basis of individual merit, not on the basis of social class, race, gender or other irrelevant factors. To put it more concretely, an individual should get high paying jobs and so forth if they have the demonstrable ability to do those jobs, and not simply because they know the right people or were born into the right families. Grading plays an important role in the meritocratic allocation of social goods because grades tell us something about an individual’s abilities. For example, a student who gets high grades in their medical exams will gets access to goods (career opportunities, attractive internships) that would not be accessible to a student who fails those exams. This is morally justified if grades really do track ability and thus ensure that goods flow to those who really deserve them.

Another way of putting this is to say that grades play an important communicative function in modern society. Grades are like signals. They tell people — peers, employers, other educational institutions — how good someone is at a particular subject or mode of inquiry. People then use these signals to make decisions about this person, i.e. whether to grant them a place on a prestigious course, whether to interview them for a job and so on. In this sense, grades are a lot like prices. The economist Friedrich Hayek once famously argued that the prices on open markets are signals: they tell producers and purchasers of goods and services what is worth producing and what is worth purchasing. They do this because they aggregate together lots of fragmented information about supply, production costs and consumer demand, and package it into a single, easy-to-interpret to signal. Arguably, grades do the same thing, particularly average grades such as the GPA. They package together lots of fragmented information about an individual’s abilities and put it into an easy-to-interpret signal. This is morally valuable because people need that information to make rational decisions about how to allocate social goods.

The third potential moral purpose of grades is that they play an important role in certifying the competence of particular individuals, and thus help to minimise risks or harms to society at large. For example, an incompetent civil engineer would be a risk to society. An incompetent engineer might design a faulty bridge and that bridge could collapse. This would injure lots of people. We don’t want that to happen. One way of preventing it from happening is by having a system that grades the competence of wannabe engineers. If they are incompetent, they will achieve low grades and will be prevented from qualifying and practicing as engineers. This moral purpose of grading is only really a feature of grading in certain subjects or courses. I doubt that there is much risk to society from an incompetent English major (though there might be). But it is very important in certain subjects such as medicine, engineering and law. Obviously grading isn’t a perfect inoculation against incompetence. People can be incompetent for all sorts of reasons. But it is one tool we can use to protect society from the risks of incompetence.

The fourth potential moral purpose of grading is that they give pleasure to students. Obviously this isn’t true of all grades. A student who receives low grades is unlikely to feel good about themselves. But for those who receive higher grades, grades can have this positive effect. They provide the student with some measure of their capacities and, perhaps, their relative social worth, and this can be quite a pleasurable thing to know. Think back yourself to those times when you received good grades. How did they make you feel? I’m willing to bet that they made many of you feel pretty good about yourselves, at least for a while. If we follow standard hedonistic logic, then we can use this aspect of grading as part of its moral justification.

There may be some other moral purposes of grading, but these four are the major ones. Note how each of them has a dark side. If it is true that grades play an important part in motivating students to learn, in justly allocating distributive goods, in protecting society from risk and in providing pleasure to students, then it is equally true that grading can, if done badly, be demotivating to students, facilitate the unjust allocation of distributive goods, increase the risk to society, and cause pain, anxiety and stress in students. In other words, grading is a morally fraught business. It is one of the many underappreciated moral dilemmas facing the academic. This suggests that our grading practices ought to be ethically scrupulous.

Are they?

2. What is grading anyway?

To properly evaluate the morality of grading, we need to take a step back for a moment and consider how grading systems work. As soon as we do this, we start to see some of the obstacles to developing an ethically scrupulous grading system. One of the major issues with grading, particularly at university, is that there are many different grading norms and practices at work. In their swingeing critique of higher education — Cracks in the Ivory Tower — Jason Brennan and Phillip Magness lament the fact that the universities assume that professors are all ‘speaking the same language’ when they assign grades to their students, but this is not necessarily true. In fact, Brennan and Magness argue, professors could be speaking any one of nine different languages when they assign grades to students (Brennan and Magness 2019, pp 119-120).

Consider a practical illustration of the problem. I currently teach at a university in Ireland. In Ireland we adopt the grading norms that are common in the UK higher education system (presumably a legacy of colonialism). Officially, we mark student assignments and exams on a numerical scale between 0-100. But within this numerical scale, we focus mainly on differentiating between first class grades (anything at 70% or above), higher second class grades (anything between 60-69), lower second class grades (anything between 50-59), third class grades (anything between 40-49) and fails (anything below 40).

I cannot speak for everyone who uses this system, but from my own perspective the fixation on differentiating between firsts, seconds and thirds, has an interesting effect on my grading. Primarily, it narrows my perception of what can be a legitimate grade. In my mind, given that the main goal is to distinguish between firsts, seconds, thirds and so on, I cannot actually mark assignments between 0-100. I can mark them between, roughly, 30-80. Anything below 30 is just generically bad, usually an indication that the student did not complete the assignment. Anything above 70 is very good, and although there a full 30 marks above 70 to play around with when differentiating between the really good assignments, you don’t want to go too far above 70 in differentiating between them. If you gave an assignment a grade of more than 90, that would suggest it is nearly flawless and, at least when it comes to the kinds of long-form writing assignments that I tend to mark, nothing warrants that evaluation. Furthermore, I was educated in this system at a time when getting barely above 70 was considered exceptional. As a result, I’m very reluctant to even go above 75 when marking assignments (I have only once given an assignment a score of 80).

In recent years, this understanding of the grading norms has become problematic. For some reason, increasing pressure has been brought to bear on staff to use the ‘full range’ of marks when grading. In other words, we are encouraged not to be afraid to give an assignment a grade of over 80 or even 90 if we think it deserves it (presumably this logic also applies to the lower end of the spectrum but this is rarely discussed). Some of my colleagues have taken to this new proposed norm with glee. They now freely give out marks of over 80, sometimes awarding as many as five or six such grades per class per year. Much to my own surprise, I have become the stodgy traditionalist, holding on to the old norm and refusing to be dragged into this brave new world. I think I have good reasons for doing so. I think sticking to the old norm means my grading is more consistent over the long term and enables better cross comparison between grades awarded in different years. But I have to accept that sticking to this norm might be a problem for my students. If they can get much higher grades from other professors, then I may be unfairly disadvantaging them. I may also make them more reluctant to do my courses. Either way, the basic problem is clear: the language that I am speaking when I award a grade of 71 seems to be quite different from the language that some of my colleagues are speaking when they award a grade of 71. In fact, my 71 could be their 81 and vice versa. This problem is further compounded by the that universities don’t account for these differences when they aggregate grades together into overall averages or GPAs.

There is also another problem with the grading norms we use. When trying to differentiate between firsts, seconds and thirds, I believe I apply an absolute standard of evaluation, or at least something pretty close to that. In other words, I think that what distinguishes a first class assignment from a second class assignment is a set of relatively invariant criteria. These criteria usually relate to the quality of argument, the depth of research, the clarity of exposition and so on. But what differentiates a 68 from a 67? Or a 55 from a 54? I’d be hard pressed to give an answer to that. I can, however, tell you what usually happens when I assign such grades. What usually happens when I assign number grades within a particular grade band is that I compare the current assignment to one I have previously marked. If I previously gave an assignment 66, and I think the current assignment is slightly better than it, I will give the current one a 67. If not, I might go slightly lower. Say a 65 or a 64. In other words, when it comes to assigning precise numbers, I switch from an absolute standard of evaluation to a relative one. I’m comparing assignments to others within a class or module group. I’ve suddenly changed the language that I am speaking.

All of this reveals that there are some tricky questions that need to be answered in order to decide exactly what it is we are doing when we grade assignments. Without belabouring the point, here are some of the main contenders when it comes to answering those questions.

(a) Grades could be an absolute measure of competency - when grading the goal might be to determine how well a student’s assignment (or perhaps the student themselves, see below) scores relative to some invariant measures of competency. Ostensibly, this is what many grading systems attempt to do by distinguishing between different numbered or lettered grades. The marker is trying to determine where along the absolute metric of competency the particular student falls. More formally, we might say that this form of grading assumes that markers use a cardinal scale to assess the competency of student assignments. This scale allows them to make meaningful assessments of the absolute merit of those assignments.

(b) Grades could be a relative measure of competency - when grading the goal might be to determine how well a student’s assignment (or perhaps the student themselves) does relative to some defined peer group. This will usually be either the other students taking the same course or module or some year group. This form of grading dispenses with the notion that grading requires a cardinal scale and instead assumes that grading involves an ordinal ranking of students. We can say that one student is better than another, but we cannot say by how much.

(c) Grades could be an output measure - when grading the goal might simply be to assess the merits (relative or absolute) of some particular piece of work (output) that is produced by the student. This might be an essay, an exam script, a presentation or something like this. In any case, the goal of grading is to focus on the merits of that output and not any other extraneous factors (e.g. how pleasant the student is; how much work they did and so on).

(d) Grades could be, at least in part, a process measure - when grading the goal might not simply be to assess the merits of an output but also, in part, to assess the processes through which the student arrived at those outcomes. In other words, we might try to assess the amount of work or effort the student put into producing the output, or we might give them credit for how much they have improved over a course of study. Many universities do something close the latter by assigning greater weight to assignments in later years of study.

(e) Grading could be an entirely backward-looking task - when grading we might be care only about past outputs and performances and focus solely on evaluating those past outputs.

(f) Grading could be, at least in part, a forward looking task - when grading we might care, in part, about the future. In other words, we might see one of the functions of grading as being to nurture or develop students for the future (Weis 1995).

These different possibilities are not mutually exclusive, at least not in practice. Many times people will teach courses in which grading is viewed as fulfilling more than one of these six ways. This can happen if students have to submit multiple assignments as part of one module, each of which adopts a different mode of grading. But it can also happen within the same assignment. I already explained how I myself sometimes switch between absolute and relative measures of competence when grading individual assignments. Although this practice is normal, it does create problems when it comes to the morality of grading.

3. The Immorality of Grading

There are several different arguments one can marshal against grading. Each of these arguments speaks to the immorality of the practice, and ties back to the moral functions of grading that were discussed previously. Although there are different ways of parcelling out these arguments, I’ll focus on three main ones in what follows.

The first argument against grading is that grading is immoral because it is unfair, inconsistent and error prone. In making this argument, we should be clear at the outset that some forms of grading, in some subjects, are immune from this criticism. Mathematics and some of the hard sciences, for example, often use tests and assignments in which there are clear, right or wrong answers to questions. Depending on the grading rubric involved, this enables markers to give reasonably objective evaluations of student assignments. There is little room for disagreement or debate about the merits of those assignments. If you got three different markers to assess the same student assignment they would agree on the mark to be given. Indeed, it may even be possible to automate the marking process because of the simple, binary nature of the marking rubric, and the high-level of intersubjective agreement on the marks to be given. In these cases, the marking process is fair, consistent and relatively error free (there is, of course, still some possibility of error in the marking process, either technical or human).

The problem is that a large number of university level subjects do not enable such consistency in marking. Many humanities and social science subjects test students on their capacity to complete long-form writing assignments, or other project-based work. There typically are no right or wrong answers in these assignments. Rather, the assignments are opportunities for students to demonstrate what they have learned and, most importantly, their skills in research, critical thinking, analysis and argumentation. Problems of inconsistency and unfairness arise when assessing these modules at both the individual and institutional level. For starters, problems arise when individual professors or lecturers are themselves inconsistent in how they perceive and approach the grading process. I noted this already above. A professor might view an assignment as a purely outcome-based measure or as a mix of outcome-based measures and process-based measures. They might attempt to apply an absolute measure of competence or a relative measure of competence, or a bit of both (as I openly admit I tend to do). They might factor in knowledge they have of a student, or they might adopt a strictly blinded system of marking (some institutions enforce this policy). This means that individual professors can themselves be inconsistent in grading. In other words, their grades can vary depending on how they happen to perceive the grading process at a given moment in time. There is a lot of room for subjective bias or error to creep in when marking in these disciplines. If you got three different people to mark the same student assignment in these disciplines, you would find considerable disagreement about the mark that should be awarded. Anyone who has taught in these disciplines will be familiar with this problem. I have myself been involved in marking student research dissertations in which I disagreed with another marker by as much as 20-30% (or 2 grade bands) on a particular assignment.

The problem of inconsistency and unfairness is compounded at the institutional level. Not only are professors inconsistent within themselves but they are inconsistent with one another across subjects. What a grade means in subject A might be very different from what it means in subject B. One professor might adopt a relative ranking and another might adopt an absolute ranking or some other combination of approaches. Universities then often do something strange. They aggregate the individual student grades from these different subjects together — often conveniently ignoring the different grading rubrics that may have been adopted (assuming this information is even available) — to generate an overall average or GPA for the student. This then becomes the students degree award and is often the main bit of information that students use when signalling to others how they performed at university. This aggregated number is, in most cases, a form of voodoo mathematics (Brennan and Magness 2019).

We should not overstate the problem, of course. There is some consistency to the grading process. There often are clear differences between first class (or A) assignments and fails. Most professors could probably agree on these high level classifications. Nevertheless, there are considerable problems when it comes to the finer-grained distinctions that academics often like to make. Furthermore, there are no easy fixes to the problem. In a widely-cited article, Daryl Close (2009) argues that current grading practices are unfair because they are not ideally impartial and consistent. He goes on to offer some recommendations on how to make the system more impartial and consistent, recommendations that include ending the practice of grading on the curve and dropping the worst assignment results from a student’s overall grade. But others have pointed out that if the goal is to achieve fairness in grading, these practices can be justified. Leslie Buckholder (2015) for example has argues a grading system is impartial and consistent if it satisfies a ’swapping test’, i.e. what would happen if you traded one individual’s assignments for another’s? Would they end up with the same grade? If they would, then the system is impartial and inconsistent. Buckholder then points out that grading on a curve and dropping individual assessments from an overall grade can satisfy the swapping test. Add to this the fact that some people think that fairness is often served by treating different cases (different students) differently and you start to appreciate the complexity of the problem.

The net result is a system of grading that is morally problematic. As noted earlier on, one of the moral functions of grades is to allocate distributive goods to people on the basis of merit. This can only happen if the system of grading is consistent and free from biases and other imperfections. But it is very clear that this is not true of the grading systems we currently adopt in universities. There is considerable arbitrariness and subjectivity at play. The signals that students send to employers and others are, consequently, not as meaningful as we like to think, and whether students end up being allocated distributive goods is, at least in part, a matter of moral luck. On top of this, if there are significant biases and subjective errors in the marking process (e.g. some professors being overly generous and others being excessively harsh), grades lose their meaning as markers of competence. This could increase the risk to society at large. No ethically sensitive academic should be sanguine about this state of affairs.

The second argument against grading is that it can be coercive and thereby undermine the good of education. Recall from earlier on that one of the moral functions of grading is that it acts as an incentive to students to learn. We want students to learn because we think learning is a good. Grades give them the motivation to do this. This is an attractive line of reasoning and our common sense would suggest that grades can indeed act as motivators. Speaking from my own personal perspective, I know was highly motivated to get good grades on my tests and assignments at college, and the desire to do well motivated me to attend classes and engage in extra-curricular reading. But my experiences may be exceptional. It’s quite possible that I would have been motivated to participate in the educational process irrespective of grading. Empirical research into the motivational effects of grading paint a mixed picture. Some research suggests that grades have a minimal, possibly counterproductive, impact on motivation (Grant and Green 2013); some research, including a recent meta-analysis by Koenka et al (2019), suggests that grades can induce anxiety rather than optimal motivation, and that written comments or feedback function as better motivators than grades (Koenka et al 2019; Chamberlin, Yasué and Chiang 2018).

Furthermore, even if it is true that grades act as motivators, the way in which they act as motivators is morally problematic insofar as they are coercive motivators. This is something that libertarian critics of compulsory education have long pointed out (Curren 1995). They argue that grades effectively function as threats to students to conform to a particular program of study or ideology of thought. By grading them, we are telling students that they must do this or else they will get a bad grade. This bad grade, in turn, can have devastating impacts on their life. It can block their access to career opportunities and other social goods. It is not a mild or trivial threat. In acting as a coercive motivator, grades thereby undermine the good of education by getting students to focus on extrinsic reasons for participating in the educational process (i.e. the desire to avoid bad outcomes) and not on the intrinsic pleasures of education. This can be counterproductive because it undermines students’ desires for self-learning, which some people argue should be the true purpose of education.

Is this argument really persuasive? Curren (1995) defends the coercive nature of grading, at least when it comes to the grading of children (as opposed to adults), on the grounds that education is a way of equipping students with the rational capacities they need to flourish as adults. We already justifiably do things to children against their will on the grounds that it serves their interests in the long run. That said, even Curren accepts that grading cannot be a justifiable form of coercion if it is unfair and inconsistent (as per the previous argument) and that it is more difficult to defend its coercive nature when it applies to adults, who are presumed to have already acquired rational competence. This is a problem for university grading since most university students are adults. It may, however, be possible to defend the coercive nature of grading on the grounds that university students freely enter into a contract with their universities, one of the terms of which is that they willing subject themselves to the university’s grading system. This could also provide grounds for thinking that academics have a duty to grade their students, a duty that stems from the promise inherent in the educational contract.

But this is the kind of argument that really only works in the cloud cuckoo land of abstract theory. When we apply it to the real world, it stretches credulity to suggest that most students freely enter into contracts with their universities. Given the economic importance and value of a university education in the modern world, it would be more plausible to suggest that students are compelled to accept these contracts out of practical necessity. Furthermore, even if they have some choice over the university they attend or the course of study they undertake, the reality is that they rarely have the option to choose a different grading system. Most universities and most courses adopt the same set of norms. Thus, students do not voluntarily subject themselves to grading systems. The problem of coercion, and the associated undermining of the good of education, remains.

The third argument against grading is the simplest. Grading is morally problematic because, far from being a source of pleasure, it is directly harmful to students. It puts pressure on them to perform to a high standard and thus fosters a lot of stress and anxiety as a result. The crisis of mental health, particularly the crisis of anxiety and depression, on university campuses is widely remarked upon. Systematic reviews of prevalence rates for anxiety and depression among college students suggest that it is higher than the prevalence rate in the general population (Ibrahim et al 2013; Pedrelli et al 2015). Some of this is plausibly linked to the importance of grading and the competitive nature of employment. An increasing emphasis is placed on high grades by employers. Anecdotally, in my own discipline of law, I know that employers often will no longer consider students who fail to get a 2:1 average in their degree. Indeed, sometimes employers are even suspicious of those average grades (rightly so if the previous arguments about inconsistency in grading practices are to be believed) and ask for full grade transcripts in order to see how the student did across the full range of their modules. This induces anxiety among student who now feel that their university education is pointless if they don’t get at least a 2:1.

There is a disturbing paradox in this argument. Grading is problematic because of the harm it does to students but, because of the social importance attached to grades, it can be just as harmful to not grade students. If you fail to grade them, you may be doing them a disservice and robbing them of a valuable signal that they can use to unlock social opportunities. This is another reason to think that academics might have a moral duty to grade. But because of this, and because of the potential harms of grading, many academics feel that there is an upward pressure on the grades they award: they are encouraged to err on the side of generosity and perhaps give students higher grades than they strictly feel they deserve. Although the evidence for grade inflation is not as strong as some people claim (Brennan and Magness 2019), the belief that there is such upward pressure on grades is widespread, even among the non-university population. This means that there is general perception that grades are becoming divorced from the reality of student competence which, in turn, undermines whatever moral purpose they might serve in signalling and allocating social goods. The result is that the harm argument against grading is particularly powerful because the harmful nature of grading has ripple effects that undermine the other moral purposes of grading.

4. Can we make grading more ethical?

If the preceding arguments are correct, then grading is in a lot of trouble. They way in which the grading system currently operates in universities prevents grades from performing the morally desirable functions we would like them to perform. In fact, it is much worse than that: grades may also do direct moral harm by undermining the goods of education, unfairly blocking access to social goods, and by being psychologically damaging. Is there anything we can do to rectify these problems? Let me close out this chapter by considering four possibilities. Each of these possibilities shares the general goal of simplifying the grading process and making it more transparent.

A. The Triage Model

First, we could adopt the ‘triage model’ proposed by William Rapaport (2011). The name here is somewhat is deceptive. The triage model proposes that we embrace the idea that grades are absolute measures of competence but we then abandon the attempt to make fine-grained numerical distinctions. Instead, we just offer three general grade classifications: full credit (if an assignment is substantially correct), minimal credit (if it is substantially incorrect), and partial credit (if it is somewhere in between). Rapaport’s use of the word ‘correct’ to describe how these grades work is unnecessarily limiting. In many disciplines there is no correct way of answering a question; there are, rather, more or less competent ways of doing so. Nevertheless, the basic gist of his proposal — that we should only make a few high-level distinctions between the merits of assignments — is worth taking seriously. It would certainly seem to take some of the arbitrariness and inconsistency out of the grading process. Professors that currently disagree about whether a student assignment merits 64% or 66% would probably be able to agree that they deserve full credit. There may, of course, continue to be some disputes about borderline students, but, overall, the room for subjectivity and bias to enter into the grading process is greatly reduced. Furthermore, the system could be made very transparent to students and thereby limit their tendency to appeal grades and ask for a few more marks to be added to their assignments to bump up their overall average.

Despite its appeal, the triage proposal suffers from some significant defects. The most obvious is that if we only have three grade classifications grades lose some of their signalling value. They are no longer indicative of the precise quality or ability of the student. They are vague and imprecise markers of ability. This means that grades can no longer play a decisive role in allocating social goods according to merit. If, for example, an employer is confronted with two students, each of whom has received a grade of partial credit, how is she to decide between them? Presumably (as is already starting to be the case) they will resort to some other ranking and rating process to assist their decision-making (Erdi 2019). For example, they might place increasing reliance on commercial psychometric or critical thinking tests. This is likely to just replace the old problematic system with a new, possibly even more problematic system, and, in the process, undermine the social value of a university education. So, somewhat perversely, someone who adopts the triage model in an effort to make their grading system more transparent and consistent could end up shooting themselves in the foot.

Of course, some people might argue that we shouldn’t be allocating social goods on the basic of merit and so shouldn’t be relying so much on the signalling value of grades. Meritocracy is a myth and all alleged systems of meritocracy just perpetuate and reinforce systems of social oppression and inequality (Markovits 2019; Littler, 2017). Good riddance to the lot of them. If we, as academics, can do anything to undermine them then we should. That’s all well and good, but if you want to deconstruct meritocracy or meritocratic systems for allocating social goods, then that’s something that needs to be done at a social and institutional level. If an individual academic adopts the triage marking system on their own, then they risk disadvantaging their students relative to the existing system. This reveals a common flaw in all the solutions I shall discuss in this section: though they might resolve some of the problems with grading, they are difficult to justify at an individual level.

B. The Strict Relative Ranking Approach

An alternative to the triage model, but with a similar underlying aim, has been suggested by Christopher Knapp (2007). Knapp criticises existing grading practices along similar lines to those presented in this chapter: they are unfair, inconsistent and hence unreliable signals. To resolve this problem, he suggests that we embrace a relativistic approach to grading. In other words, we should accept that it is impossible to consistently evaluate students against some absolute measure of competence and accept that the best we can do is rank students — or more precisely student assignments — relative to one another. Grading thus becomes an exercise in preparing a purely ordinal ranking of student assignments. If this is made transparent to all, then it becomes crystal clear what grades really mean, and thus they can be credibly used to perform the functions we require.

Knapp’s proposal again has the merits of simplicity and transparency. Academics are not pressured to make fine-grained distinctions between students based arbitrary or subjective criteria. Still, there are some obvious problems with the idea of embracing a purely ordinal ranking. The most obvious is that it would again rob grades of much of their signalling value and force people to rely on other, possibly more troubling criteria, to distinguish between different students. If all a grade tells me is that a student did better than another student taking the same module, in the same year, in a particular university, then what does it really tell me? That depends on the number of students taking this module (what if 150 students took it? what if only 5 did?) as well as other assumptions I might make about the quality of the university and the cohort of students within it. If the student comes from an elite university, then I might be inclined to weigh their result highly; but if they come from a lower ranked university then I might be inclined to discount them. This could just further exacerbate elitism and inequality in the allocation of social goods. For all the flaws it might entail, treating grades as absolute measures of competence at least allows for the pretence of levelling the social playing field. A first class grade from one university should — in an ideal world — be as good as a first class grade from any other. I know I certainly like to convince myself that this is true.

In addition to all this, as Knapp himself notes, this approach would only work in practice if there was institutional reform and acceptance. An individual academic who adopted it without such institutional support might do their students a disservice and be sanctioned in the process.

C. The “No Grades” Model

A more radical solution to the grading problem is to simply abandon the practice altogether. Don’t give grades. Instead give critical comments and feedback, and enter into a dialogue with students about their work. The philosopher Robert Paul Wolff writes about this in book The Ideal of the University (originally published in 1969). He contrasts different approaches to grading: evaluation and criticism. He notes that, in many ways, criticism is the norm in academia. When academics write articles and submit them for peer review, they don’t expect to be graded. They expect to receive detailed, qualitative, critical feedback, highlighting both the merits and demerits of their work. It is somewhat odd then that when these same academics turn their attention to students, they focus largely on quantitative grading, not qualitative feedback. That’s not to say that they never provide qualitative feedback — they do and in some countries they are being encouraged to provide more of it — but it is to say that the primary focus is on the quantitative grade. Why not ditch this and focus solely on the critical feedback?

The educational theorist Jesse Stommel adopts a similar attitude to Wolff. He proudly declares on his professional webpage that he doesn’t grade students. Instead, he focuses on dialoguing with them and providing qualitative feedback. Indeed, he goes a step further and encourages students to largely assess the merits of their own work, for themselves. He thinks this approach can save education from the tyranny of grades:

“...grades are the biggest and most insidious obstacle to education… Agency, dialogue, self-actualization, and social justice are not possible in a hierarchical system that pits teachers against students and encourages competition by ranking students against one another. Grades (and institutional rankings) are currency for a capitalist system that reduces teaching and learning to a mere transaction. Grading is a massive co-ordinated effort to take humans out of the educational process…”

There is a lot to admire in Stommel’s idealism. The critical feedback approach has been the hallmark of the Oxbridge tutorial system of education for centuries (albeit combined with a lot of quantitative and competitive grading). In addition to this, critical feedback, not numerical grading, is the language of ordinary professional life. If one of the goals of education is to prepare students for the professional world, then criticism is the way to go. Doing so would also remove much of the anxiety and competitiveness from the process.

Nevertheless, it is not a flawless solution. Numerical grades work as simple communicative signals of ability (even if they are flawed); detailed critical comments do not. Likewise, as anyone who has undergone the process of peer review will know, it is just as prone to arbitrariness and subjective bias as the numerical grading system. In fact, it may be even more prone to it. Furthermore, qualitative feedback isn’t a practical solution for many academics. It might work well in small groups but anyone teaching large groups will quickly struggle to provide meaningful qualitative feedback to all students. At the very least, it would require significant additional resources to make it worthwhile. Finally, it is often not feasible for an individual academic to abandon grades altogether since students and institutions expect them. Jesse Stommel faces this problem in his own classes. Although his emphasis is on qualitative feedback, at the end of a module he does provide grades to his students since they are required by his university. So although he may want to resist the capitalistic model of education, he cannot do it on his own. Again, we have the problem of the individual vs the institution rearing its ugly head.

D. The Moral Compromise

The last solution to the problem is not really a solution at all. It is to throw our hands up and accept that we cannot, by ourselves, create a morally perfect system of grading. We cannot address the injustices of meritocratic social allocation; we cannot remove all the anxiety and competitiveness from modern social life. The best we can do is compromise with the system we have, and make the best of it. For my own part, I think this means that academics should do three things when marking student assignments:

(i) They should accept that, for the time being, grading of some sort is inevitable: students expect it, institutions require it, and society demands it. Oftentimes it would be worse for students if you didn’t do it and it would, arguably, be a dereliction of duty.

(ii) They should be open and transparent with students about how they approach grading and what a grade means in their class, i.e. they should explain to students whether they are measuring outcomes or processes, trying to develop relativistic measures or absolute measures, and the criteria they typically use to help make evaluations.

(iii) They should acknowledge the limitations of the current grading norms and reduce their arbitrariness as much as is possible. So, for example, someone like myself, working with the numerical grading system in which the main focus is on distinguishing between different grade boundaries (1sts, 2nds, 3rds etc), should try to be fair and consistent in allocating students to the major grade categories but then avoid fooling themselves into thinking they can make lots of precise delineations between specific numerical grades.

This may not be perfect, but it is a start.

Now, if you will excuse me, I have to go throw some papers off the top of the nearest staircase.

References

Brennan, Jason and Magness, Phillip (2019). Cracks in the Ivory Tower: The Moral Mess of Higher Education. Oxford: OUP.

Burkholder, Leslie (2015). Impartial Grading Revisited. Teaching Philosophy 38(3): 261-272

Chamberlin, K., Yasué, M., & Chiang, I.-C. A. (2018). The impact of grades on student motivation. Active Learning in Higher Education. https://doi.org/10.1177/1469787418819728

Cholbi, Michael. (2015) Passing Judgment. The Philosophers’ Magazine 2nd Quarter: 71-76.

Close, Daryl. (2009). Fair Grades. Teaching Philosophy 32(4): 361-398

Curren, Randall R. (1995). Coercion and the Ethics of Grading and Testing. Educational Theory 45(4): 425- 441

Erdi, Pétér. (2019) Ranking: The Unwritten Rules of the Social Game We All Play. Oxford: OUP.

Ibrahim, Ahmed; Shona J. Kelly, Clive E. Adams, Cris Glazebrook (2013) A systematic review of studies of depression prevalence in university students. Journal of Psychiatric Research, 47(3): 391-400, https://doi.org/10.1016/j.jpsychires.2012.11.015

Koenka, Alison, Lisa Linnenbrink-Garcia, Hannah Moshontz, Kayla M. Atkinson, Carmen E. Sanchez & Harris Cooper (2019) A meta-analysis on the impact of grades and comments on academic motivation and achievement: a case for written feedback, Educational Psychology, DOI: 10.1080/01443410.2019.1659939

Knapp, Christopher. (2007) Assessing Grading. Public Affairs Quarterly 21(3): 275-294

Grant, D., Green, W.B. (2013) Grades as incentives. Empirical Economics 44, 1563–1592. https://doi.org/10.1007/s00181-012-0578-0

Littler, J. (2017). Against Meritocracy. Oxford: Routledge.

Markovits, D. (2019). The Meritocracy Trap. London: Allen Lane.

Pedrelli, P., Nyer, M., Yeung, A., Zulauf, C., & Wilens, T. (2015). College Students: Mental Health Problems and Treatment Considerations. Academic psychiatry : the journal of the American Association of Directors of Psychiatric Residency Training and the Association for Academic Psychiatry, 39(5), 503–511. https://doi.org/10.1007/s40596-014-0205-9

Rapaport, William J. (2009). A Triage Theory of Grading: The Good, the Bad, and the Middling. Teaching Philosophy 34(4): 347-372

Solove, Daniel. A Guide to Grading Exams. 14 April 2014, available at https://www.linkedin.com/pulse/20140414044726-2259773-a-guide-to-grading-exams/

Stommel, Jesse (2017). Why I don’t Grade. 26 October 2017, available at https://www.jessestommel.com/why-i-dont-grade/

Weis, Gregory (1995). Grading. Teaching Philosophy 18(1): 3-13

Wolff, Robert Paul (2017). The Ideal of the University. New York: Routledge - originally published in 1969 by Beacon Press.

8 comments:

SergeFebruary 28, 2020 at 9:15 AM
"In an autobiographical essay published in 1946, Albert Einstein reflected on his days as a student of physics some fifty years earlier. He recalled his teachers with affection but, referring to exams, said, “This coercion had such a deterring effect that after I had passed the final examination, I found the consideration of any scientific problems distasteful to me for an entire year.” In the same vein, an assessment of teaching and learning at Harvard University in 1992, based on interviews with 570 undergraduates, concluded that many students avoided taking science classes not because of the heavy workload but because of the competition for grades."

Kohn, Alfie. Punished by Rewards: The Trouble with Gold Stars, Incentive Plans, A's, Praise, and Other Bribes (p. 151). Houghton Mifflin Harcourt. Kindle Edition.

I'm currently working on a project about gender equity in science and technology education and careers, and "the gender paradox" (https://en.wikipedia.org/wiki/Gender-equality_paradox) might be explained by this very competitions for grades...
Berry BJune 9, 2020 at 10:15 PM
Wow, what an excellent article, thank you John! I had my first grading duty last semester, and would have profited from reading this before. I'm lucky to be teaching programming, where grading can technically be easy (percentage of tasks participants can solve in given time through programming), even mostly automatic, yet the underlying ethics are still relevant for me to consider. Do go ahead and put this in a book!
QuinnJanuary 22, 2021 at 12:18 AM
This would be a great book. Don't forget about Deming! He was a huge critic of grading. Here's a few minute overview of this. Tripp also lists some other resources and authors. http://podcast.deming.org/deming-lens-34-joy-in-learning
tim quickApril 23, 2024 at 7:48 PM
Admittedly, a minor and somewhat surly point, but I had a professor at Michigan State (James Roper) give me a detailed account of the staircase method - and debates he had had with others about its finer points - in 1984. I was 18 at the time. That means Solove was 11 years old. I am not saying Roper invented it either (he didn't claim to have), but I feel like Solove should acknowledge that he is merely repeating a wide-spread academic joke - not inventing it.
tim quickApril 23, 2024 at 7:56 PM
I apologize, I really do, but I guess the other surly point I want to make is this. At the last couple of universities I have been at, it was made painfully clear to me that the vast majority of students should get an A, a few who maybe barely attended and did poorly on the tests should get a B, and anyone who did worse that than should be referred to some suppourt office or another for help. According to the NYT, 80% of grades given at Yale last year were As. Given this level of grade inflation, some of the fine points and distinctions you offer seem irrelevant.
Mark ChapmanJuly 28, 2025 at 6:28 PM
Promedio de Notas
Funplay systemsAugust 5, 2025 at 6:57 AM
km pathi's ethics

Philosophical Disquisitions

Pages

Thursday, February 27, 2020

The Moral Problem of Grading: An Extended Analysis