1331.0 - Statistics - A Powerful Edge!, 1996
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 31/07/1998
Page tools: Print Page Print All | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
MEASURES OF SPREAD
Variance (symbolised by s2 ) and standard deviation (symbolised by s ) are similar in calculation to the mean deviation. However, instead of taking absolute values between the mean and each observation, the square of the values is used. Variance involves squaring deviations, so it does not have the same unit of measurement as the original observations. For example, lengths measured in metres (m) have a variance measured in metres squared (m2 ). Thus: VARIANCE, S2 = AVERAGE SQUARED DEVIATION OF VALUES FROM MEAN Taking the square root gives us back the units used in the original scale. This is the standard deviation. Thus: STANDARD DEVIATION, S = AVERAGE SQUARED DEVIATION OF VALUES FROM MEAN Standard deviation is the measure of spread most commonly used in statistical practice when the mean is the measure of centre. Thus it measures spread about the mean. Because of its close links with the mean, standard deviation can be seriously affected if the mean is a poor measure of location. The standard deviation is also influenced by outliers; it is a good indicator of the presence of outliers because it is so sensitive to them. Therefore, the standard deviation is most useful for symmetric distributions with no outliers (normal distributions). Standard deviation is useful when comparing the spread of two data sets. The data set with the smaller standard deviation has a narrower spread of measurements about the mean and, therefore, usually has comparatively fewer high or low values. So, an item selected at random from a data set whose standard deviation is low has a better chance of being close to the mean than has an item from a data set whose standard deviation is high. PROPERTIES OF STANDARD DEVIATION When using standard deviation keep the following properties in mind.
When analysing normally distributed data, standard deviation can be used with the mean to calculate intervals within which data lie.
where: = mean; and s = standard deviation DISCRETE VARIABLES The variance for a discrete variable made up of n observations is defined by: The standard deviation for a discrete variable made up of n observations is the positive square root of the variance and is defined by: A step by step approach to finding the standard deviation for a discrete variable is: Calculate the mean. Subtract the mean from each observation. Square each result. Add these squares Divide this sum by the number of observations. Take the positive square root.
= = 6.32 grams FREQUENCY TABLE (DISCRETE VARIABLES) The formulas for variance and standard deviation change slightly if observations are grouped into a frequency table. Squared deviations are multiplied by each frequency’s value, and then the sum of these results is calculated. The variance for a discrete variable in a frequency table is defined by: The standard deviation for a discrete variable in a frequency table is defined by: A step by step approach to finding the standard deviation for a discrete variable in a frequency table is:
x = = 150 /30 = 5 To calculate the standard deviation: = = 2.25 GROUPED VARIABLES (CONTINUOUS OR DISCRETE)
First the mid-point of time intervals must be found. The number of students is the frequency. The mean can now be calculated. = 6,670 ÷ 220 = 30.32 Then the calculations xf, (x - ), (x - )2, and (x - )2f are made:
NOTE: When a variable is grouped by class intervals, it is assumed that all observations within each interval are equal to the mid-point of the interval. Thus, the spread of observations within each interval is ignored. Therefore, the standard deviation will always be less than the true value and should be regarded as an approximation.
= 30.32 s = 6.04 The interval is given by: - 3s < x < + 3s That is: 30.32 - (3 x 6.04) < x < 30.32 + (3 x 6.04) 30.32 - 18.12 < x < 30.32 + 18.12 30.32 - 18.12 < x < 30.32 + 18.12 12.20 < x < 48.44 This means that there is about a 99% certainty that an observation will lie between 12 hours and 48 hours. That is, a student in the sample will watch between 12 and 48 hours of television each week. SUMMARY There are several ways to describe the centre and spread of a distribution. One is to use a five number summary that uses the median as its centre and gives a brief picture of distribution. Another method is to use the mean and standard deviation. This technique is best used with symmetric distributions with no outliers. Despite this restriction, the mean and standard deviation are much more commonly used than the median and five number summary. The reason for this is that many natural phenomena can be approximately described by a normal distribution. For normal distributions, the mean and standard deviation are the best measures of centre and spread. EXERCISES
|