1331.0 - Statistics - A Powerful Edge!, 1996  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 31/07/1998   
   Page tools: Print Print Page Print all pages in this productPrint All  
Contents >> Stats Maths >> Measures of Spread - Variance and Standard Deviation

MEASURES OF SPREAD

VARIANCE AND STANDARD DEVIATION

Variance (symbolised by s2 ) and standard deviation (symbolised by s ) are similar in calculation to the mean deviation. However, instead of taking absolute values between the mean and each observation, the square of the values is used.

Variance involves squaring deviations, so it does not have the same unit of measurement as the original observations. For example, lengths measured in metres (m) have a variance measured in metres squared (m2 ). Thus:

VARIANCE, S2 = AVERAGE SQUARED DEVIATION OF VALUES FROM MEAN

Taking the square root gives us back the units used in the original scale. This is the standard deviation. Thus:

STANDARD DEVIATION, S = AVERAGE SQUARED DEVIATION OF VALUES FROM MEAN

Standard deviation is the measure of spread most commonly used in statistical practice when the mean is the measure of centre. Thus it measures spread about the mean. Because of its close links with the mean, standard deviation can be seriously affected if the mean is a poor measure of location. The standard deviation is also influenced by outliers; it is a good indicator of the presence of outliers because it is so sensitive to them. Therefore, the standard deviation is most useful for symmetric distributions with no outliers (normal distributions).

Standard deviation is useful when comparing the spread of two data sets. The data set with the smaller standard deviation has a narrower spread of measurements about the mean and, therefore, usually has comparatively fewer high or low values.

So, an item selected at random from a data set whose standard deviation is low has a better chance of being close to the mean than has an item from a data set whose standard deviation is high.

PROPERTIES OF STANDARD DEVIATION

When using standard deviation keep the following properties in mind.
  • Standard deviation is only used to measure spread about the mean.
  • Standard deviation is only used to measure spread about the mean.
  • Standard deviation is never negative.
  • Standard deviation is sensitive to outliers. A single outlier can raise the standard deviation a great deal, distorting the picture of spread.
  • The greater the spread, the greater the standard deviation.
  • If all values of a data set are the same the standard deviation is zero.

When analysing normally distributed data, standard deviation can be used with the mean to calculate intervals within which data lie.
  • about 68% of the data lie in the interval: - s < x < + s
  • about 95% of the data lie in the interval: x - 2s < x < + 2s.
  • about 99% of the data lie in the interval: - 3s < x < x + 3s

where:
= mean; and s = standard deviation

DISCRETE VARIABLES

The variance for a discrete variable made up of n observations is defined by:

Equation: Variance of a discrete variable

The standard deviation for a discrete variable made up of n observations is the positive square root of the variance and is defined by:

Equation: Standard deviation of a discrete variable

A step by step approach to finding the standard deviation for a discrete variable is:

Calculate the mean.
Subtract the mean from each observation.
Square each result.
Add these squares
Divide this sum by the number of observations.
Take the positive square root.

1.The weights (in grams) of 8 eggs are: 60, 56, 6 l, 68, 51, 53, 69, 54.

Find the standard deviation.

First the mean must be calculated:
= 472/8
= 59

Weight (x)
2

60
1
1
56
-3
9
61
2
4
68
9
81
51
- 8
64
53
- 6
36
69
10
100
54
-5
25
472
320
From the above table:
Equation: calculation of the mean = 320
Thus, to calculate the standard deviation:

Equation: Calculation of the standard deviation
=

= 6.32 grams


FREQUENCY TABLE (DISCRETE VARIABLES)

The formulas for variance and standard deviation change slightly if observations are grouped into a frequency table. Squared deviations are multiplied by each frequency’s value, and then the sum of these results is calculated.
The variance for a discrete variable in a frequency table is defined by:

Equation: The variance for a discrete variable in a frequency table

The standard deviation for a discrete variable in a frequency table is defined by:

Equation: The standard deviation for a discrete variable in a frequency table

A step by step approach to finding the standard deviation for a discrete variable in a frequency table is:
  • Tally the x variables.
  • Calculate and sum the frequencies.
  • Multiply the frequencies with the x variables.
  • From this, calculate the mean.
  • Subtract the mean from each observation.
  • Square each result.
  • Multiply each square by the frequencies.
  • Sum the results.
  • Divide this sum by the sum of the frequencies.
  • Take the positive square root.
1.Thirty graziers were asked how many shearers they hire during a shearing season. Their responses follow:
4, 5, 6, 5, 3, 2, 8, 0, 4, 6, 7, 8, 4, 5, 7, 9, 8, 6, 7, 5, 5, 4, 2, 1, 9, 3, 3, 4, 6, 4.

Shearers (x)
Tally
f
xf
2
2 f

0
l
1
0
-5
25
25
1
I
1
1
-54
16
16
2
II
2
4
-3
9
18
3
III
3
9
-2
4
12
4
III I
6
24
-1
1
6
5
IIII
5
25
0
0
0
6
IIII
4
24
1
1
4
7
III
3
21
2
4
12
8
III
3
24
3
9
27
9
II
2
18
4
16
32
30
150
152
To calculate the mean:

x =

= 150 /30

= 5

To calculate the standard deviation:


=

= 2.25

GROUPED VARIABLES (CONTINUOUS OR DISCRETE)

2.A group of 220 Year 10 students were asked how much time theyspent watching television per week. The results are given below. Calculate the mean and standard deviation of hours spent watching television by the 220 students.

Hours
No. of students

10-14
2
15-19
12
20- 24
23
25- 29
60
30- 34
77
35-39
38
40- 44
8

First the mid-point of time intervals must be found. The number of students is the frequency. The mean can now be calculated.



= 6,670 ÷ 220

= 30.32


Then the calculations xf, (x - ), (x - )2, and (x - )2f are made:

    Hours
Mid-point (x)
f
xf
2
2 f

    10 - 14
12.5
2
25.0
-17.82
318
636
    15 - 19
17.5
12
210.0
-12.82
164
1,968
    20 - 24
22.5
23
517.5
-7.82
61
1,403
    25 - 29
27.5
60
1,650.0
-2.82
61
480
    30 - 34
32.5
77
2,502.5
2.18
8
385
    35 - 39
27.5
38
1,425.0
7.18
5
1,976
    40 - 44
42.5
8
340.0
12.18
52
1,184
220
6,670.0
148
8,032

3.Standard Deviation:



NOTE:
When a variable is grouped by class intervals, it is assumed that all observations within each interval are equal to the mid-point of the interval. Thus, the spread of observations within each interval is ignored. Therefore, the standard deviation will always be less than the true value and should be regarded as an approximation.


3.Assuming the frequency distribution is approximately normal, calculate the interval within which 99% of the previous example’s observations would be expected to occur.

= 30.32

s = 6.04

The interval is given by:

- 3s < x < + 3s

That is:
30.32 - (3 x 6.04) < x < 30.32 + (3 x 6.04)
30.32 - 18.12 < x < 30.32 + 18.12
30.32 - 18.12 < x < 30.32 + 18.12
12.20 < x < 48.44

This means that there is about a 99% certainty that an observation will lie between 12 hours and 48 hours. That is, a student in the sample will watch between 12 and 48 hours of television each week.

SUMMARY

There are several ways to describe the centre and spread of a distribution. One is to use a five number summary that uses the median as its centre and gives a brief picture of distribution.
Another method is to use the mean and standard deviation. This technique is best used with symmetric distributions with no outliers.
Despite this restriction, the mean and standard deviation are much more commonly used than the median and five number summary. The reason for this is that many natural phenomena can be approximately described by a normal distribution.
For normal distributions, the mean and standard deviation are the best measures of centre and spread.


EXERCISES

1.For the following sets of data find:
i)the range
ii)the mean deviation
a) 6, 8, 11, 15, 24, 38
b) 11, -6, -2, 16, 9, -8, 17, 19
c) 6.4, 3.8, 5.9, 4.7, 5.3, 7.1, 3.2
2.The number of marriages registered in New South Wales from 1987 to 1996 were as follows:
Year
Number of marriages (x)

1987
40,650
1988
40,812
1989
41,300
1990
41,450
1991
39,594
1992
40,734
1993
39,993
1994
38,814
1995
37,828
1996
35,716
Find the:
a) range
b) median
c) upper and lower quartiles
d) interquartile range
e) five number summary
3.The maximum daily temperatures (in degrees Celsius) in Melbourne from April 21 to May 3 1993 were as follows:
29.3, 29.1, 28.2, 19.1, 18.8, 22.4, 18.4, 17.0, 20.2, 25.0, 25.8, 24.1, 22.1.
a) Find the range.:
b) Calculate the interquartile range.:
c) What is the five number summary?:
d) Draw a box and whisker plot for this data.:
4.The number of industrial disputes in Queensland from 1982 to 1991 were as follows:
    Year
    Number of industrial disputes (x)

    1982
    266
    1983
    231
    1984
    223
    1985
    262
    1986
    260
    1987
    230
    1988
    191
    1989
    182
    1990
    165
    1991
    153
a) Find the range.
b) Calculate the interquartile range.
c) What is the five number summary?
d) Draw a box and whisker plot for this data.
e) Calculate the mean deviation.
5.The number of basketball matches attended by 50 Perth Wildcat season ticket holders in 1997 were:
15, 10, 17, 11, 15, 12, 13, 16, 12, 14, 14, 16, 15, 18, 11, 16, 13, 17, 12, 16, 18, 15, 17, 15, 19, 13,
14, 17, 16, 15, 12, 11, 17, 16, 15, 10, 14, 15, 13, 16, 18, 15, 17, 11, 14, 17, 15, 14, 13, 16.
a) Tally the data.
b) Draw a column graph.
c) Calculate the mean, median and mode.
d) Calculate the variance and standard deviation.
e) Calculate the interval within which 95% of observations would be expected to occur.
f) Comment on the spread of the data.
    Click here for answers



    Previous PageNext Page