The point that using spatial data visualizations can easily yield bias is well-rehearsed. (The first time it was made to me was in a middle school algebra textbook.) The basic problem is that, if you are representing the difference between two quantities, only one dimension of your data visualization should covary with the data. So you can have a vertical bar graph, where the width of rectangles is fixed but the height covaries. Here’s an example where our data are the numbers 1 and 2:
library(ggplot2) d <- data.frame(Group = letters[1:2], Value = c(1,2)) ggplot(aes(x = Group, y = Value), data = d) + geom_bar(stat="identity")
<object type="image/svg+xml" data="/files/blog/shame-538/vert-bar.svg"></object>
Alternatively, you can have a horizontal bar graph where the width covaries but the height stays the same:
library(ggplot2) d <- data.frame(Group = letters[1:2], Value = c(1,2)) ggplot(aes(x = Group, y = Value), data = d) + geom_bar(stat="identity") + coord_flip()
<object type="image/svg+xml" data="/files/blog/shame-538/horiz-bar.svg"></object>
In both these cases, the area of the “b” bar is twice that of the “a” bar.
On the other hand, if you allow both dimensions of your graph to covary with the data, you are lying:
library(ggplot2) d <- data.frame(Group = letters[1:2], Value = c(1,2)) ggplot(aes(x = Group, y = Value, width = Value/2), data = d) + geom_bar(stat="identity")
<object type="image/svg+xml" data="/files/blog/shame-538/bad-bar.svg"></object>
Here the underlying data is still 1 and 2, but the “b” bar is four times as big as the “a”. This gives a false impression of the relationship between a and b – that is, it’s a visual lie.
Five Thirty-Eight is a statistics blog, formerly focusing on election prediction but now a sort of general statistics blog. They just published an article by Eno Sarris trying to answer the question of whether Sam Adams is “really” a craft beer company.
The article contains the following image comparing the revenue of various craft beer companies (as of 7:15pm EDT on June 21):
<img src="/files/blog/shame-538/samadamsgrx4.jpeg" />
Check out the difference in sizes between the MillerCoors and Sam Adams bottles. Ginormous, right? Now look at the numbers. That visual difference is supposed to represent a 12-fold increase in revenue (9.1B → 9,100M / 759M = 12.0). Does that seem right to you?
It didn’t seem right to me, so I got out my ruler. The shape of the bottles is irregular, but I’ll focus on the region from the bottom up to where the sides begin to taper. This is a rectangle, so it’s easy to measure. Beer bottles are also a standard size (for most American beer, including Sam Adams and Miller), so we can assume that the rest of the bottle is proportional to this region. On my screen the Miller bottle is 5/8 (0.625) inches wide and 5/4 (1.25) inches tall, whereas the Sam Adams bottle is 1/8 (0.125) inches wide and 1/4 (0.25) inches tall. This is a ratio of 5:1 in both directions (Why 5 and not 12, the underlying number? I have no idea.). How does that translate into area? Well, the Miller bottle’s bottom portion has an area of 0.78 in2 (= 5/8in × 5/4in), whereas the Sam Adams bottle is 0.031 in2 (= 1/8in × 1/4in). That’s a ratio of 25:1. So this visualization overstates the difference in revenue by a factor of roughly 2.
Bonus question: beer bottles are containers, which have volume. What happens if viewers interpret the image in terms of volume? Well, going from one dimension to two squared the ratio, from 5:1 to 25:1. Going to a third dimension causes the base ratio to cube, to 125:1. (A little more uncertainty is also introduced by the irregular shape of the neck of the bottle, and how to take account of the small amount of air at the top.) Nonetheless, there’s the potential for viewers to be misled by a factor of up to 10 (= 125 / 12) by this interpretation.
It’s awful to see one of the most respected mainstream (self-designated) authorities on “data” to be using such crappy visualizations. I said above that this kind of mistake is a lie. In this case, it’s an inept lie since the article’s conclusion (insofar as it has one) is that Sam Adams is not a craft brewery, whereas the effect of this mistake is to overemphasize the difference between the non-craft breweries (Miller and AB InBev) and Sam Adams.
In a way we’re lucky that this isn’t an article about electoral politics, where misleading visualizations are stock in trade for sleazy politicians. Maybe Eno Sarris was out sick on the day that they taught this in middle school. But he – and the editorial staff at 538 – ought to do better.