Using Data to Find Grading Bias
I love data. A lot. When a close friend was in the hospital, I calculated the exact amount of my free time I dedicated to visiting him each day and put it in a fancy set of charts and graphs. Not for any guilt-tripping purposes, I did this for the love of data alone.
When it comes to teaching, it is no surprise that I keep three different sets of notes on my classes and gather as much information as I possibly can. During assessment time, I go an extra step and use my data to look for my own biases in grading. I have biases and so does everyone reading this, because we’re human. To defeat them, we have to know about them, and data helps up to do that. Here’s how I look for my own biases in grading:
First, I need to determine possible biases I could have. I’m confidant that I don’t grade students differently based on their religious view because I have no idea what those views are. However, I do universally know age, sex, and whether they are in a class that I love teaching or one that makes me want to pull out my hair because they just don’t care about my subject. All of these are potentials for bias.
Next, I need to organize the data I collect: sorting the grades I give them by class. (In my case, students are in single-grade, single-sex classes so separating the grades by class is enough for my purposes.)
Now I need a good program to crunch the numbers and make the pretty charts. Excel is overwhelmingly annoying to me so I use Numbers on a Mac. (Lets not get into a Mac vs PC debate, okay? I’ll use mine and you’ll use yours.)
Here’s how my data looks after administering my final assessment:
I entered the number of each grade (A, B, C, D) in each class and also calculated the percentage distribution in each class. Specific class information is on the left.
My second table focuses on two big potential biases: age (grade level) and sex.
Since all these numbers give me a headache, I switch to charts now. In Numbers, I can just click a button and it makes the chart, and I can move them and the tables around freely.
Here is what my grade distribution looks like when divided by sex.
Notice something? I’m not a statistician but that difference looks significant. I should look into it.
Now comes the hard part. Am I giving my girls better grades because they are girls (a potential bias) or is there something else going on?
Ah, the old correlation/causation conundrum. Two things are correlated (grade distribution and sex) but that doesn’t automatically mean there is a causal relationship.
Putting my skeptical hat on, I need to look at the four possibilities:
1. A caused B.
2. B caused A.
3. A and B were both caused by C.
4. A and B are not actually related, the correlation is coincidental. (Spurious Correlations exemplifies this excellently.)
There are many factors I need to consider. For one thing, the pattern in my grading is reflected in other classes by other teachers. Maybe all the teachers have the same bias, or the girls at my school are consistently performing better than the boys for some reason.
And look at this:
In this one class, a huge proportion of the students (all boys) got a failing grade. One extreme deviation can skew the whole data set. This class is a fluke, but it affects the total numbers. (Why so many D’s here? The short of it is that their priority is definitely not my class.)
Finally, is there another reasonable explanation that could account for this data? In this case, yes.
Instead of doing a deep dive on South Korean school culture, I’ll give the shortest (and most reductive) version for brevity’s sake: my girls are essentially forbidden from going out at night and have to stay home and study, but my boys can go out and do non-studying activities like playing soccer or computer games (both are very much “not for girls” here**).
In South Korea, there is a huge pressure on all students to do well in school. The studying is extreme (many have school for 14 hours a day, 6 days a week). However, while girls and boys face the same pressure to study, girls have an extra pressure to not do certain other things. Partially as a result of this, girls consistently perform better than boys at my school. They generally behave better in class too. Having sexually segregated classes might also have something to do with it, considering what else happens with a room full of 16 year old boys who’ve never shared classes with girls.
At this point, I decided that my data probably wasn’t a cause for alarm, but I should stay mindful of my grading choices. I looked at other potential biases and found similar results. My toughest class didn’t perform badly because I didn’t like them (I do like them, even though they drive me crazy sometimes). Rather, the reason they are my toughest class is because they don’t participate in class and not participating made them unprepared for the test, which resulted in low grades. (Possibility #3, A and B were both caused by C.) Other lines of evidence support this conclusion, drawn from my extensive hand-written teaching notes.
Of course, looking for biases in my grading would be largely unnecessary if my test could be anonymous. While I do think that anonymous grading is an ideal to aim for, it’s impossible in my case since I have a face-to-face speaking test. I do also think some teachers are kidding themselves with “anonymous” grading. Covering up a student’s name (a practice I’ve often seen) doesn’t do much if you can recognize the handwriting. Biases are there whether we like it or not, and data provides a good way to spot them. We also need to be very mindful of the multiple possible reasons for correlations to exist, with or without known causation.
**Here, in this case, is my specific location. I am not claiming this is true for all of South Korea, but I can speak for my neighborhood and school specifically. I have spoken with hundreds of my students about this very subject quite recently.