Statistics - Standard Deviation

Most people are familiar with the concepts of the mean, median, and mode. They are measures of the central tendency of a value that has measured in a given population. They tell us, in different ways, about the value of an attribute at the heart of the population, rather than at the positive or negative extremes. Other measures are used to tell us about the variation within a population. One frequently used measure of variation is standard deviation.

Consider the following list of top grossing films as of March 2018 (in nominal USD)

Film	Year	Gross	Run-time
Avatar	2009	2787965087	162
Titanic	1997	2187463944	194
Start Wars: The Force Awakens	2015	2068223624	136
Jurassic World	2015	1671713208	124
The Avengers	2012	1518812988	143
Furioius 7	2015	1516045911	137
Avengers: Age of Ultron	2015	1405403694	141
Harry Potter and the Deathly Hallows – Part 2	2011	1341511219	130
Star Wars: The Last Jedi	2017	1332261619	152
Frozen	2013	1290000000	102

We can calculate the average run-time of a top grossing film to be 142.1 minutes, but this tells us nothing about the variation of run-times within the population. One way we can quantify the variation is by looking at how far each film is from the mean.

Film	Delta from Mean
Avatar	19.9
Titanic	51.9
Start Wars: The Force Awakens	-6.1
Jurassic World	-18.1
The Avengers	0.9
Furioius 7	-5.1
Avengers: Age of Ultron	-1.1
Harry Potter and the Deathly Hallows – Part 2	-12.1
Star Wars: The Last Jedi	9.9
Frozen	-40.1

Nice! now we can clearly see variation in film run-times. While this does make things clearer, it doesn’t give us a single measure of variation. It makes sense to find the average distance from the mean, no? But look what happens when we try to do that.

Film	Delta from Mean
Avatar	19.9
Titanic	51.9
Start Wars: The Force Awakens	-6.1
Jurassic World	-18.1
The Avengers	0.9
Furioius 7	-5.1
Avengers: Age of Ultron	-1.1
Harry Potter and the Deathly Hallows – Part 2	-12.1
Star Wars: The Last Jedi	9.9
Frozen	-40.1
Sum	0

We cannot calculate an average because when we add up the deltas, we get 0. This is because the negative values cancel out the positive ones – as one would expect if you think about it. But this average distance from the mean seemed like such a good way to provide a single value that represents the variation in the data. Well, we can get around this issue by squaring the deltas (because squaring any number produces a positive number). So let’s try that.

Film	Delta from Mean
Avatar	396.01
Titanic	2693.61
Start Wars: The Force Awakens	37.21
Jurassic World	327.61
The Avengers	0.81
Furioius 7	1.21
Avengers: Age of Ultron	26.01
Harry Potter and the Deathly Hallows – Part 2	146.41
Star Wars: The Last Jedi	98.01
Frozen	1608.01
Sum	5334.90

Here’s a visualization of what we just did.

The red dot in the center of the number line is the mean of the run-time values (i.e. { 162, 194, 136, 124, 143, 137, 141, 130, 152, 102 }). The blue dots are distance of each film’s run-time from the mean (i.e. { 19.9, 51.9, -6.1, -18.1, 0.9, -5.1, -1.1, -12.1, 9.9, -40.1 }, 2 points omitted due to space constraints). The yellow squares are the distances (deltas) squared. Now, since we have a sum which we can divide by the total number of films to calculate an average, we can produce a single number that represents the variance.

variance = sum of squares/number of squares = 5334.90/10 = 533.49

Variance shown as the average square on the number line. — Variance is the average square.

In the the graphic above, I tried to illustrate the variance as the average of these squares – shown in orange, not to scale. Now to relate back to our original values, we simply take the square root of the variance. This square root of the variance is called the standard deviation. It is a single value that tells us how dispersed the values are within a data set. It’s shown below as the green line.

standard Deviation = √variance = √533.49 = 23.10

The standard deviation shown as a green line on the bottom edge of the variance square. — Standard Deviation

The standard deviation of running times the top 10 grossing films as of the time of the writing of this blog is 23.1 minutes.

Jason Favrod

Adventures in Information

Statistics – Standard Deviation

Leave a Reply Cancel reply