1331.0 - Statistics - A Powerful Edge!, 1996  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 31/07/1998   
   Page tools: Print Print Page Print all pages in this productPrint All  
Contents >> Appendix >> Answers to Exercises

ANSWERS TO EXERCISES

INTRODUCTION:
2.DATA - INFORMATION - KNOWLEDGE
4.Fewest = Example 4.
Most = Example 2.
5.Nurse. No, the value 0 can be regarded as information.
6.Age-group = 35-39 (for both males and females)
7.A. Keating, C. Sampson, R. Jameson
8.11 (Mark Philippoussis’ fastest serve can be regarded as illustrated information)
9.Historians for research, history students for inclusion in assignments.
10.Governments, for planning health and social policies.
11.No. Statistics on the number of possessions and disposals do not necessarily accurately measure a player’s overall contribution.
12.Example 4
13.None show all the individual observations collected. (In Example 4, a number of players’ service speeds would have been collected but only 11 are shown.)
INFORMATION STUDIES:
DATA COLLECTION
2.A sample survey is less expensive and quicker to undertake.
4.The size of population to be surveyed, speed with which you want results, need for small area information, money and personnel you have to conduct data collection, and degree of accuracy you want from the results.
DATA PROCESSING
1.DATA - COLLECTION - PROCESSING - INFORMATION
4.Information without editing of data is almost certainly less accurate.
5.a) Instead of ‘yes’ the number of ewes mated should be shown.
b) You cannot be both ‘never married’ and ‘divorced’!
c) You cannot go to work on a motorbike and claim you also ‘did not go to work’. However, someone could legitimately put a mark in both the ‘Motorbike’ and ‘Worked at home’ fields. Can you say why?
INFORMATION PROBLEMS WITH USING
1.a) Inappropriate estimation based on an unrepresentative sample.
b) Various problems associated with volunteer sampling (see section Non-Random Sampling for details).
c) This will always be the case!
d) Misunderstands definition of unemployment for ABS sample survey.
e) Possible difference in the respective definition of forest cover.
STATS MATHS: ORGANISlNG DATA
1.a) c
b) d
c) d
d) c
e) c
f) d
g) c
h) c
i) d
j) d
k) c
l) d
2.Various answers

3.a)
(x)
    Tally
Frequency (f)

1
    I
1
2
    I l
2
3
    I I I I
5
4
    I I I
3
5
    I
1
12
b)3 occurs the most

4.a) Discrete
b)
Number of
customers (x)
Tally
Frequency (f)

20
I l
2
21
I I I I l l
7
22
l l l l
4
23
I I I
3
24
I l l l
5
25
l l
2
26
l l
25
c) 21
d)

(x)
(f)
Relative frequency
Percentage
frequency

20
2
0.08
8
21
7
0.28
28
22
4
0.16
16
23
3
0.12
12
24
5
0.20
20
25
2
0.08
8
26
2
0.08
8
25
1.00
100

5.a) Continuous
b)
Windspeed (x)
Tally
Frequency (f)

0 - <5
I l l
3
5 - <10
l l l l
4
10 - <15
l l l l l l l l l l l l
14
15 - <20
l l l l l l l l l
11
20 - <25
I l l l l l
7
25 - <30
0
30 - <35
l
1
c) 10 - <15
40
d)
Windspeed (x)
(f)
Relative
frequency
Percentage
frequency

0 - <5
3
0.075
7.5
5 - <10
4
0.100
10.0
10 - <15
14
0.350
35.0
15 - <20
11
0.275
27.5
20 - <25
7
0.175
17.5
25 - <30
0
0.000
0.0
30 - <35
1
0.025
2.5
40
1.000
100.0
e)The most common occurring windspeed is from 10 to less than 15 knots, and has a 35% chance of occurring on any one day based on this sample of 40.
6.a)
Stem
Leaf

0
2 4 9 1 0
1
8 0 4 5 9 2 6
2
1 7 5 9 4 6 8 2 6 3
3
5 1 8 7 3
4
3 1 0
b)
Stem
Leaf

0
0 1 2 4 9
1
0 2 4 5 6 8 9
2
1 2 3 4 5 6 6 7 8 9
3
1 3 5 7 8
4
0 1 3
7.a)
Stem
Leaf

0(0)
0 3 4
0(5)
5 7 8 9
1(0)
0 0 1 1 2 2 2 2 3 3 4 4 4 4
1(0)
5 5 6 6 7 7 7 8 9 9 9
2(0)
0 1 2 3 3 4 4
2(5)
3(0)
4
b)34 is an outlier. This was either because of a particularly windy or stormy day during 40 days of recording wind speed, or it might have been a measurement error.
c)i) The distribution has only one main peak.
ii) The distribution is very roughly symmetrical or could even be roughly skewed to the left if the outlier is removed. It is probably best to call it an irregular shape.
iii) The centre is 24 knots.
8.a)Discrete
b)
Stem
Leaf

0
7 8 8
1
0 4 5 7 7 7
2
0 0 3 4 6 6 6 8 9 9 9
3
0 0 1 1 2 2 2 2 2 3 6 7 8
c)
Stem
Leaf

0(5)
7 8 8
1(0)
0 4
1(5)
5 7 7 7
2(0)
0 0 3 4
2(5)
6 6 6 8 9 9 9
3(0)
0 0 1 1 2 2 2 2 2 3
3(5)
6 7 8
d)No
e)i)One main peak
ii)Skewed to the left
iii)28 road fatalities
9.a)Continuous
b)
Stem
Leaf

5
7
6
1 2 2 4 4 8 8 8 9
7
0 2 3 6 8 8 9
8
1 1 8 9
c)No, the stems are not overcrowded.
d)8.8 and 8.9 are possible outliers. They are due to particularly warm years where high minimum daily temperatures gave a high mean minimum temperature for the year.
e)The distribution has one peak, and its general shape is roughly symmetric (although this is difficult to observe with a small amount of data). The distribution’s centre is 7.0°C.
10.a)Discrete
b)
Weekly salary (x)
Tally
Frequency (f)

420 - <440
I
1
440 - <460
l l l l
4
460 - <480
l l l l l l l l l l l l
14
480 - <500
l l l l l l l l
9
500 - <520
l l l l l l l
8
520 - <540
l l l l l l l
8
540 - <560
l l l l
5
560 - <580
l
1
50
c) $460 - <$480
d)
Relative
frequency
Percentage
frequency

0.02
2
0.08
8
0.28
28
0.18
18
0.16
16
0.10
10
0.02
2
1.00
100
e)Most people in the company earn between $460 and $480 a week based on this sample of 50 people. Only 1 staff member earned over $560 a week.
f)
Stem
Leaf

43
7
44
0 1 3
45
9
46
1 1 3 3 6 6
47
0 0 0 1 3 3 6 8
48
1 4 4 6 6 7
49
0 7 9
50
2
51
1 3 4 4 7 9 9
52
1 2 3 3 5 7 8
53
9
54
2 3 6 8
55
5
56
4
g)
$555 and $564 could be outliers. They exist because 2 of the 50 people surveyed may have been managers or directors and thus on higher salaries, or 2 people may have deliberately provided misleading responses.
h)i) The distribution has a number of peaks, possibly bimodal.
ii) The distribution has no symmetry nor is it skewed.
iii) The centre is between $487 and $490.

CUMULATIVE FREQUENCY AND PERCENTAGE
1.a & d)
Stem
Leaf
Frequency (f)
Actual
upper value
Cumulative frequency
Cumulative percentage

0
0 1 2 3 5 6 6 7 9 9
10
9
10
25.0
1
0 0 2 3 3 3 3 5 7 9
10
19
20
50.0
2
0 1 2 2 2 4 4 5 5
9
25
29
72.5
3
3 5 5 5 8 9
6
39
35
87.5
4
4
1
44
36
90.0
5
0 6 9
3
59
39
97.5
6
3
1
63
40
100.0
b)Possible outliers are 56, 59 and 63. (Find out which monarch reigned for 63 years.) However, as this is factual data, they exist simply because the monarchs who reigned for this time lived longest after coming to the throne early in their lives.
c)i) Two peaks appear at the beginning of the distribution.
ii) The distribution could be said to be skewed to the right.
iii) The centre is approximately 19 years.
e)
      Graph: An ogive with two different vertical axes - one for the cumulative frequency and cumulative percentage.
f)10
g)4
h)At the time of writing, Queen Elizabeth II has reigned for 44 years. This was well above the centre of distribution and only 4 other monarchs have reigned longer.
2.a)
Stem
Leaf

2(0)
1
2(5)
3(0)
1 2 4
3(5)
5 7 7 8 8
4(0)
0 0 2 3 3 3 4
4(5)
5 5 6 6 7 7 8
5(0)
0 1 4 4
5(5)
5 9
b)21 is an outlier. Perhaps only 21 fries were left in a batch when the student ordered fries that particular day.
c)i) Unimodal
ii) Roughly symmetric if the outlier is removed.
iii) The centre is approximately 43 fries.
d)

Frequency (f)
Actual
upper value
Cumulative
frequency
Cumulative
percentage

1
1
1
3.3
3
34
4
13.3
5
38
9
30.0
7
44
16
53.3
8
49
24
80.0
4
54
28
93.3
2
59
30
100.0
30
e)
      Graph: An ogive with two different vertical axes - one for the cumulative frequency and cumulative percentage.
f)9
g)46.7%
h)44
3.a)Continuous
b)

Age group
Number of
females

End-point
Cumulative
frequency
Cumulative
frequency

15
0
0.0
15-24
339
25
339
37.6
25-34
273
35
612
67.8
35-44
147
45
759
84.1
44-54
121
55
880
97.6
55-64
22
65
902
100.0
c)
      Graph: An ogive with two different vertical axes - one for the cumulative frequency and cumulative percentage.
d)No-one under 15 years of age can be classified as unemployed.
e)25-34, (approximately 29 years old).
f)37.6%.
g)2.4%.
h)Governments can establish job creation schemes directed at particular age groups (in this case, the most likely would be for those under 25 years of age).
4.a)Continuous
b)

Time (x)
    Tally

Frequency
Relative
frequency
Percentage
frequency

0 - <10
0
0.00
0
10 - <20
l
1
0.02
2
20 - <30
l l l
3
0.06
6
30 - <40
l l l l
4
0.08
8
40 - <50
l l l l l l
7
0.14
14
50 - <60
l l l l l l l l
10
0.20
20
60 - <70
l l l l l l l l l l l l
15
0.30
30
70 - <80
l l l l
5
0.10
10
80 - <90
l l l l
4
0.08
8
90 - <100
l
1
0.02
2
Total
50
1.00
100
c)
      Graph: A histogram representing the data and mark in the frequency polygon.
d)
Stem
    Leaf

0
1
    2
2
    259
3
    1378
4
    0134559
5
    0122566889
6
    001233445567899
7
    13567
8
    0379
9
    8
98 is a possible outlier. This person may have had difficulty in getting to work, or simply lives quite a distance from work.
e)i) Unimodal
ii) The distribution is quite symmetric.
iii) The approximate centre is 59 minutes.
f)
Frequency (f)
End-point
Cumulative frequency
Cumulative percentage

0
10
0
0
1
20
1
2
3
30
4
8
4
40
8
16
7
50
15
30
10
60
25
50
15
70
40
80
5
80
45
90
4
90
49
98
1
100
50
100
g)
      Graph: An ogive with two different vertical axes - one for the cumulative frequency and cumulative percentage.
h)60 - <70 minutes
i)2%
j)8
MEASURES OF LOCATION
1.a)i) 0.1
ii) 0
iii) 0
b)i) 2
ii) 2
iii) 2
c)i) 2.78
ii) 2.5
iii) 3.9
d)i) 154.3
ii) 154.3
iii) 152.3
2.a)i) 0
ii) 0
iii) 0
iv) The mean, median and mode are equal. This distribution is almost symmetrical.
b)i) 6.6
ii) 6.7
iii) 6.7
iv) Distribution is skewed left, so the mean is less than the median and therefore closer to centre. The mode and median are the same.
c)i) 1.85
ii) 1
iii) 1
iv) The median and mode are the same. The distribution is skewed right, so the mean is more than the median and therefore closer to centre. In b) and c) the mean has been influenced by a few low and high values respectively.
3.a)i) 48
ii) 40-49
b)i) 23
ii) 20-24
4.a)72,186.5
b)68,953.5
c)The measures are quite close together, given the size of each observation, hence the difference is not significant. The median probably gives the best indication of the data’s centre, as there is a large diversity of observation values. The median would not be affected by the very large or very small values.
d)A government could use these measures to plan for building schools, hospitals, roads etc. It could also use them to help predict revenue intake from taxation.

5.a)
Score (X)
    Tally
Frequency

0
1
    l l
2
2
    l l l
3
3
    l l l l
4
4
    l l l l
4
5
    l l l l
4
6
    l l
2
7
    l l l l l l l l
10
8
    l l l
3
9
    l l l l l
6
10
    l l
2
40
b)mean = 5.9, median = 7, mode = 7
c)The median is higher than the mean because most of the observations have high values. The mean is influenced by the lower scores. The mode is equal to the median.
6.a)33.6
b)25-34 (Note: interval sizes are not the same. If they were, the 15-24 interval would be the modal-class interval.)
c)25-34
d)All three results lie within the same interval, but distribution is skewed to the right.
e)The younger age groups, 15-19 and 20-24, are filled with school leavers who have not yet been able to get a job, and are too young to have acquired the experience necessary to qualify for many jobs. The age groups after 25-34 contain a larger proportion of people who have left the workforce temporarily or simply retired.
f)To plan employment schemes that cater for younger people; to try to create work for a younger workforce.
7.a)
Hours
Number of men (x)
End-point
Cumulative frequency
Cumulative percentage

0
0
0
0
0 - <5
1
5
1
1
5 - <10
18
10
19
19
10 - <15
24
15
43
43
15 - <20
25
20
68
68
20 - <25
18
25
86
86
25 - <30
12
30
98
98
30 - <35
1
35
99
99
35 - <40
1
40
100
100
b)
      Graph: An ogive with cumulative frequency as the y-axis showing a random analysis of 100 married men and their distribution of hours spent per week doing unpaid household work.
c)Median = 17 hours. The middle of the distribution is 17 hours.
d)15 - <20 hours
e)16.8 hours. The mean number of hours that a married man spends doing unpaid household work is 16.8 hours.
f)The mean and median are very similar, and all measures lie in the modal-class interval. The distribution is close to symmetrical.
g)A similar survey could be done (possibly even surveying the wives of men who participated in this survey!), analysing the results in a similar fashion, and comparing the results.
8.a)$10,400 - $15,599. (Note that interval sizes are not the same.)
b)
Income ($)
Persons
End-point
Cumulative frequency
Cumulative percentage

0
0
0.0
0 - 2,079
114,195
2,079
114,195
9.4
2,080 - 4,159
44,817
4,159
159,012
13.1
4,160 - 6,239
45,862
6,239
204,874
16.9
6,240 - 8,319
139,611
8,319
344,485
28.4
8,320 - 10,399
114,192
10,399
458,677
37.8
10,400 - 15,599
148,276
15,599
606,953
50.0
15,600 - 20,799
123,638
20,799
730,591
60.2
20,800 - 25,999
121,623
25,999
852,214
70.2
26,000 - 31,199
103,402
31,199
955,616
78.7
31,200 - 36,399
73,463
36,399
1,029,079
84.8
36,400 - 41,599
59,126
41,599
1,088,205
89.7
41,600 - 51,999
68,747
51,999
1,156,952
95.3
52,000 - 77,999
56,710
77,999
1,213,662
100.0
c)Cumulative percentage
      Graph: An ogive showing the 1996 Census annual income of people aged 15 years or more in Western Australia.

d)The median is approximately $15,500.
e)The mean is $20,691.
f)It is difficult to compare the mode with the mean and median because of the difference between the sizes of the intervals. The mean is higher than the median because it is affected by the higher incomes. This means that the distribution is skewed to the right.
g)The median, as it is not influenced by extreme values.
h)Some possible answers include: social welfare organisations interested in the number of low income earners; businesses interested in the number of high income earners; and governments and other service providers would use such data, especially when broken down by such characteristics as age, sex and geographic area, to locate services appropriately.
MEASURES OF SPREAD
1.a)i) 32
ii) 9.3
b)i) 27
ii) 9.25
c)i) 3.9
ii) 1.11
2.a)5,734
b)40,321.5
c)Q1 = 38,814 Q2 = 40,812
d)1,998
e)35,716 - 38,814 - 40,321.5 - 40,812 - 41,450
3.a)12.3
b)8.05
c)17.0, 18.95, 22.4, 27.0, 29.3
d)
Graph: A box and whisker plot showing the maximum daily temperatures (in degrees Celsius) in Melbourne from April 21 to May 3 1993.
4.a)113
b)78
c)153, 182, 226.5, 260, 266
d)
Graph: A box and whisker plot showing the number of industrial disputes in Queensland from 1982 to 1991.
e)34.84
5.a)
Number of matches
(x)
Tally
Frequency (f)

10
l l
2
11
l l l l
4
12
l l l l
4
13
l l l l
5
14
l l l l l
6
15
l l l l l l l l
10
16
l l l l l l l
8
17
l l l l l l
7
18
l l l
3
19
l
1
50
b)
      Graph: Column graph representing the number of basketball matches attended by 50 Perth Wildcat season ticket holders in 1997.
c)mean = 14.62, median = 15, mode = 15
d)S2 = 4.96, S = 2.23
e)10.17 < x < 19.07
f)The standard deviation is quite low, which indicates that the data is not widely spread about the mean. The mean and median are very close together, which indicates that the data is roughly symmetrical.
SAMPLING METHODS
1.Various answers
2.a)9
b)Various answers
3a)i) Canberra
ii) Sydney
iii) Darwin
iv) Melbourne
b)i) Darwin
ii) Canberra
iii) Perth
iv) Sydney
c)No, the table does not give total population figures.
4.a)Stratified sampling
b)
K
P
1
2
3
4
5
6
7
8
9
10
11
12

Males
2
2
2
2
2
4
4
5
15
14
13
13
11
9
Females
1
2
2
2
2
3
7
6
12
12
12
17
13
11

Previous Page