Use Microsoft Excel to be More Data Driven (Part II)
This is Part 2 of a piece about visualizing assessment data, designed to help teachers see more information about an assessment than just the class average. If you missed Part 1, stop now and go read it first. I will assume that you have assessment data in a tidy format described in the previous post.
My goal here is to demonstrate how I use Excel to give me detailed information about class performance in a time frame that allows me to do something meaningful with the information. Last week I mentioned measures of central tendency (like mean and median), but I intimated that also knowing the variance of the data would be more useful in describing how well the class, and individual students performed on a given assessment.
Statistics allows us to calculate things like range, variance, and standard deviation, but I find this numbers to be opaque in helping me to learn something about assessment performance. Rather, I prefer to visualize the distribution using tools in Microsoft Excel. I find these visualizations to be more intuitive, and that students comprehend these visual data more easily.
I will use the assessment data created last week to demonstrate two visualizations. One is simple; it uses color to show where in the distribution of scores a student falls. The second is called a histogram, which tells me something about the frequency of each grade (A through F). From these two visualizations, I can identify how well the class did over all by the shape of the distribution, and I can identify students that are performing well below their peers and might require some intervention.
Creating a Space for Visualization
It gets messy quickly trying to visualize data on the same page where the data is stored. Instead, we will set up a second sheet, that will pull the relevant information (student and score for a given assessment). Even cleaner would be to create an entirely separate file, but that is beyond the scope of this demonstration.
At the bottom of the Excel window, there are tabs labeled “Sheet 1,” “Sheet 2,” and “Sheet 3.” Your assessment data should be on Sheet 1, which I have relabeled “Assessment Data.” Click on “Sheet 2,” and you can relabel it if you like by clicking on it again.
On this sheet we want a new table. Each row will be a student, rather than a particular assessment score, so we will copy and paste the class roster into the first column, labeled “Name.”
Each other Column will represent an assessment. Here, I have labeled Column B as “Quiz 1,” because it refers to the first quiz in the assessment data. We will fill the data in a moment.
First, we want to freeze the left column, so that when we scroll left to right, the names will always be visible. This is a similar process from last week, but this time we select
View -> Freeze Panes -> Freeze First Column
Next, we want to input the Adjusted Percentage from the first sheet. To do this, sort the data on Sheet 1 ascending by name, assessment type, and assessment unit. This should be the data in a format that is easy to find, copy, and paste. Make sure the Names column on the Visualization Sheet is also sorted alphabetically. Then, select Adjusted Percentage cells that correspond to all of the scores for that given assessment, and copy (CTRL-C).
Finally, right click on the cell that represents the first entry under “Quiz 1;” this should be cell B2. “Paste Special” -> “Values” -> OK. The keyboard shortcuts to do the same thing: Select the cells to be copied, then CTRL-C. Select the cell where these values will be pasted (B2 on Visualization Sheet), then CTRL-ALT-V, then V, then ENTER. If we just “paste” instead of “paste special,” by default Excel will paste the formulas, and not the values.
First Visualization: Color
We are now ready to do the first visualization. We will use a color scale to visually identify the center of the distribution and how far (up or down) from the center each score is. To do this, we will use a powerful tool in Excel called “Conditional Formatting.”
Select the Score data (for me this is B2:B20) -> Home Ribbon -> Conditional Formatting -> Color Scale -> Green-Yellow-Red Color Scale. You should get something that looks like this:
Already, some things stand out. We have one really low outlier (55%). To me, this identifies a student who needs an intervention. However, in its current format, I can’t easily see how well the class did as a whole. To do this, we want to sort the scores in descending order.
Select Cell A1 -> Data Ribbon -> Filter
Then sort the current assessment in descending order:
My data artificial, so it looks symmetric. To the right are two other distributions that would be skewed right (overall poor performance) and skewed left (overall great performance)
In the second column (Quiz 2.1), you can see that a lot of the values are green, and relatively few are red, while in the third column (Quiz 2.2), you can see the reverse is true. If a saw this exact picture, I would be concerned that I had not taught the material assessed by Quiz 2.2 well, and some review should happen before the unit test. Further, I can make a note to try alternate approaches the next time I teach the course. This assumes that the quiz is a valid assessment of the material.
Second Visualization: Histogram
The conditional formatting has the benefit of being quick and easy, but it has some limitations. While it’s great for identifying outliers, the shape of the distribution is represented by color rather than by geometry, which is an abstraction. A histogram will show us the shape of the distribution more intuitively. To do this, we need to bin the scores. An obvious choice is to count the A, B, C, D, and F scores. Since we’re going to be writing some Excel functions, we might as well go ahead and compute some summary statistics. I will do mean, median, and standard deviation in addition to the binning.
Counting grades (Bins):
We can use the COUNTIF() function to calculate the frequency of each grade within the distribution. For those who are familiar with statistics, you will note that these will not be equally spaced bins. In my district the grading scale is:
A: (93 – 100)
B: (85 – 93]
C: (77 – 85]
D: (70 – 77]
F: (0 – 70]
In column A, I am going to reproduce these labels. I am going to do it in reverse order, and in a way that will look clean on a graph. This will make the actual graphing process easier later. In addition, I am also going to add “Mean”, “Median,” and “Standard Deviation.”
We are breaking some tidy data rules here, but this sheet is really meant to be processed and presented in Excel (as opposed to some other programming language like Python or R). Notice also that as you scroll left to right, you will always be able to see Column A. This is a nice feature late in the semester as you are looking back over assessment data.
Now, we will use Excel Functions to compute the data for column B, and we can just copy-paste to columns C, D, … Z. In Cell B22 (the cell corresponding to the count of grade F for Quiz 1), type “=COUNTIF(“, and you will see that it asks for range and criteria arguments. The range will be the values for Quiz 1: “B2:B20”. You can select these rows by dragging over them with your mouse, or you can manually type them (no double quotes). The criteria is a logical statement used to evaluate if the value represents an F: in double quotes “<0.70”). In the picture below, you can see all of it put together:
Calculating the range for F is straightforward, because there is no lower bound. For D, we need the values that are greater than or equal to 70 AND less than 77 (i.e. 0.7 <= D < 0.77). Unfortunately, the criteria argument in Excel is not up to the task. Instead, we will count all of the values that are not F (>=0.7), and subtract from that number the number of values that are C, B, or A (>=0.77). In cell B23: “=COUNTIF(B2:B20, “>= 0.7”) – COUNTIF(B2:B20, “>=0.775”)”
You are telling Excel to count the cells in range B2:B20 that are greater than or equal to 70%, and then count and subtract the number of cells in range B2:B20 that are greater than or equal to 77.5%.
Repeat for C and B:
C -> “=COUNTIF(B2:B20, “>=0.775”) – COUNTIF(B2:B20, “>=0.85”)
B -> “=COUNTIF(B2:B20, “>=0.85”) – COUNTIF(B2:B20, “>=0.925”)
And finally for A, we can just define the lower boundary:
You’ll notice that I adjusted the D/C boundary and the B/A boundary. This is because of the way we adjusted the 10 point scale grading system to a 7 point scale grading system. You can mess with the boundaries as you like (they’re all arbitrary anyway). Finally, you want to format the cells so that the values are a number with 0 decimal points. Again, this will make graphing easier later.
Select Cells B22:B26 -> Right Click -> Format Cells… -> Number -> Decimal Places = 0
Here is a screen shot of the completed functions. I have the Grade B cell selected, so that you can see the formula in the formula bar
We will graph the histogram in a moment, but first let’s complete the summary statistic functions. These will be trivial compared to binning our scores into grades.
Standard Deviation: “=STDEV(B2:B20)”
You may need to format the standard deviation cell, so that it is a percentage with one decimal place.
Now that these functions are in place, we never have to type them again, we can just copy and paste across the columns for each new assessment
Select the cells with all the functions (B22:B30) -> CTRL-C -> Select cells C22:D22 -> CTRL-V
Plotting the Histogram
Now that the summary statistics are computed, and the bins created, we are ready to plot a histogram of the grade distribution.
Hold down CTRL and select with your mouse the Bin labels (A22:A26), then select the counts for an assessment, for Quiz 2.2 (D22:D26). By holding down CTRL, you can select any assessment distribution, even if it is not directly next to the labels in column A.
Insert Ribbon -> Under “Charts,” Column -> 2-D Column -> Clustered Column
You should get this:
Notice that the x-axis is already labeled the way we want, because of the way we layed out the bin labels (F-A). You can change these labels by modifying the cells directly.
Now, I will show you how to do some MODERATE processing of this graph to make it look more presentable, and then show you what an even fancier graph can look like.
First, get rid of “Series 1″ label by selecting it and deleting it.
Add a chart title, and axes labels on the “chart tools -> layout” ribbon
Finally, get rid of the Grid lines, and add Data labels:
From this you can get a good sense of the shape of the distribution. You can see very few did well, a lot fell right in the middle, and a disproportionate number made a D or F. If you just look at the mean (77.5%), you would think that the class did about average, and might not think anything of it. The distribution shows you that the class did worse than the mean or median might convey. Finally, as a teacher, you can now return back to the colored conditional formatting to identify the students that might need individual interventions.
And finally, here is a peek at one some more heavily modified charts can look like. Let me know in the comments if you want to know how either of these charts are made:
In education, there is a lot of lip service given to “data driven instruction.” I think it is a great sentiment, but this means that the data has to be accurate and timely. By automating some of the data analysis, Excel (and other similar programs), can help the practitioner to get better, faster assessment results.