| |
Variance and
Standard Deviation
Imagine you have a dartboard and a good number of darts.
Imagine you throw the darts at the dartboard one at a time and aim
each time for the bull’s eye at the center. Let’s look
at some possible kinds of arrangements of the darts as a loose introduction
to the idea of variability (of which standard deviation and variance
are measures). Let’s look at possible distributions of the darts
over the dartboard.
If you were a precise dart-thrower, there might be a clustering of
darts around the center. In this case we’d say there was relatively
little variability in the distribution of darts – this is a
way of saying that most of them fell in the same vicinity.
If you weren’t as precise, it may look as if the darts were
haphazardly arranged over the dartboard. If they were spread out all
over the board, and there was little clustering, we’d say that
the distribution of darts had a higher variability than in the first
case.
Variability is a term that describes how spread out a distribution
of scores (or darts) is. Variance and standard
deviation are closely
related ways of measuring, or quantifying, variability. [Standard
deviation is simply the square root of variance; these concepts will
be explained shortly.]
Finishing with the dartboard example, it is not necessary for the
darts to cluster around the center in order to have low variability.
If the darts clustered around any part of the dartboard, those darts
would still have low variability. And the more closely packed they
were, the lower the variability would be. We could quantify the degree
of variability by assigning numerical values to the positions of the
darts in some way, but that’s not our concern just now. For
now, it’s enough to note that the variance and standard deviation
are measures of the average amount of variability in where the darts
land.
The example above was a rather simple way of talking about variability,
but the concept of variability isn’t much more complicated than
that – even when applied to concrete numbers. By remembering
a few easy formulas and a few easy ideas, you’ll be a master
at computing variances and standard deviations.
General problem: You are given a set of n numbers (or scores) and
you are asked to find the sum of squared deviations, variance, and
standard deviation of that set of numbers. Here is good way to proceed;
examples will follow.
- Find the mean (or arithmetic average) of the
scores. To find the mean, add up the scores and divide by n where
n is the number
of scores.
- Find the sum of squared deviations (abbreviated
SSD). To get the SSD, find the sum of the squares of the differences
(or deviations)
between the mean and all the individual scores.
- Find the variance.
If you are told that the set of scores
constitute a population, divide the SSD by n to find the variance.
If instead
you are told, or can infer, that the set of scores constitute
a sample, divide the SSD by (n – 1) to get the variance.
- Find
the standard deviation. To get the standard deviation, take
the square root of the variance.
Notes: It may be convenient to remember that there are two different
formulas that can be used to compute the SSD, both yielding identical
answers. The first formula, the definitional formula, proceeds directly
from the definition of SSD, is easy to remember, and was the one used
in the list of steps above: SSD =∑i (m - xi)
2 where m is the mean
of the scores and the xi are all the individual scores. The second
formula for computing SSD, called the computational formula, is sometimes
handy for use in a calculator but harder to remember: SSD = ∑i
(xi)
2 – (1/n)(∑i xi) 2. The first term in this formula directs you
to square each score and then add up the resulting values. The second
term in this formula directs you to sum the scores, and then square
the sum, and then divide the result by n where n is the number of
scores. We will illustrate the use of both formulas for finding the SSD.
Example 1: Find the SSD, variance, and standard deviation for the
following population of scores: 1, 2, 3, 4, 5 using the list of steps
given above.
- Find the mean. The mean of these five numbers (the population
mean) is (1+2+3+4+5)/5 = 15/5 = 3.
- Let’s use the definitional
formula for SSD for its calculation: SSD is the sum of the squares
of the differences (squared deviations)
between the mean and the individual scores. The squared deviations
are (3-1) 2, (3-2)2, (3-3) 2, (3-4)
2, and (3-5) 2. That is,
4, 1, 0, 1, and 4. The SSD is then 4 + 1 + 0 + 1 + 4 = 10.
- Divide SSD by n, since this is a population of scores, to get
the variance. So the variance is 10/5 = 2.
- The standard deviation
is the square root of the variance. So the standard deviation
is the square root of 2.
For practice, let’s also compute the SSD using the computational
formula, ∑i (xi) 2 – (1/N)(∑i xi)
2. ∑i (xi) 2 = 12 + 22+ 32
+ 42 + 52 = 1 + 4 + 9 + 16 + 25 = 55. (1/N)(∑i xi)
2 = (1/5) (1 +
2 + 3 + 4 + 5) 2 = (1/5) (152) = 45. So SSD = 55 – 45 = 10,
just like before.
Example 2: Find the SSD, variance, and standard
deviation for the following sample of scores: 1, 3, 3, 5.
- The average
of these four numbers (the sample mean) is (1+3+3+5)/4
= 12/4 = 3.
- So, SSD = (3-1)2 + (3-3)2 + (3-3)2 + (3-5)2 = 4
+ 0 + 0 + 4 = 8.
- Now, because we were told that these scores constitute
a sample, we’ll divide SSD by n-1 to get the sample variance.
In our case we have four scores, so n = 4 so n-1 = 3. Therefore,
our sample variance
is 8/3.
- And the sample standard deviation is square root
of 8/3.
As you can see, calculating these statistical measures is quite easy,
and it’s only a matter of remembering a few formulas and practicing
their application.
Further Facts: If a constant number is added to each score in a sample
or population, the standard deviation will not change. [This makes
sense. Adding a constant will not
change how disparate the scores are. The mean will increase by the
constant, but the SSD will stay the same, and, consequently, so will
the variance and standard deviation.]
Example: Let A be the population of scores {1, 3, 3, 5} and let B
be the population of scores {3, 5, 5, 7}. Notice that B is produced
by adding 2 to each score in A. The mean of A is 3; the mean of B
is 5. The SSD of A is (3-1)2 + (3-3)2 + (3-3)2 + (3-5)2 = 4 + 0 +
0 + 4 = 8; the SSD of B is (5-3) 2 + (5-5) 2 + (5-5)
2 + (5-7) 2 =
4 + 0 + 0 + 4 = 8 also. So the SSD of A equals the SSD of B. So the
variance and standard deviation of A and B are both the same; they
are 2 and square root of 2, respectively.
Also, multiplying each score in a sample or population by a constant
factor will multiply the standard deviation by that same factor.
An algebraic explanation: in the original set of scores, to get the
SSD we are summing terms of the form (a-b) 2, where a is the mean
and b is a score. If you multiply each score by a constant, x, the
new mean would be xa, and the score that had a value of b would be
translated to a score with the value xb. Now, for SSD, we would be
summing terms of the form (xa-xb) 2, which is
x2 (a-b) 2. So the SSD is multiplied by x2, so the variance is also
multiplied by x2 (since variance is just SSD divided by the number
of scores), and so the standard deviation would be multiplied by x,
because standard deviation is the square root of the variance.
Example: Let A be the population of scores {1, 3, 3, 5} and let B
be the population of scores {2, 6, 6, 10}. Notice that B is produced
by multiplying 2 by each score in A. From the preceding paragraph,
we know that the mean, SSD, variance, and standard deviation of A
are 3, 8, 2, and square root of two, respectively. Let’s now
compute these for B. The mean of B is 6 (which is twice the mean of
A). The SSD of B is (6-2) 2 + (6-6) 2 + (6-6)
2 + (6-10) 2 = 16 +
0 + 0 + 16 = 32, which is four (or two squared) times the SSD of A.
The variance of B is 8 (or two squared times the variance of A) and
the standard deviation of B is square root of eight (which simplifies
to two times the square root of two, which is two times the standard
deviation of A).
This handout covered the calculation of SSD, variance, and standard
deviation. For a comprehensive explanation of the use these concepts,
see for example Chapter 4 of Statistics for the Social Sciences by
Gravetter & Wallnau, a copy of which is owned by the UHV Academic
Center located in the UHV Center Building.
Standard deviation is a measure of how far from the mean you would
expect an average score to be. The higher the standard deviation,
the farther you would expect a randomly chosen score to be from the
mean, numerically. Weight is usually more variable in human populations
than height, so the standard deviation of the weight variable should
be bigger than the standard deviation of the height variable for a
population. Also, the standard deviation is a way of relating a particular
score to the rest of the sample. If a score is within one standard
deviation from
the mean, it can be considered a pretty ordinary score. On the other
hand, if a score is, say, two standard deviations away from the mean
(either below or above), then that score can be considered an extreme
value. Standard deviation, as the name suggests, helps to give a standardized
picture of distributions, in the sense that no matter what the mean
or what is being studied, a score that’s two standard deviations
away from the mean can be considered extreme (in the context of that
set of scores). Standard deviation is important to the use of z-scores.
Resources
Gravetter, F. J., & Wallnau, L. B. (2004). Statistics for
the social sciences (6th Ed.). Belmont, CA: Wadsworth.
Copyright 2005 by the Academic Center and the
University of Houston-Victoria.
Created 2005 by Hari Damodaran. |
|