Australian Bureau of Statistics

Rate the ABS website
CensusAtSchool
ABS @ Facebook ABS @ Twitter ABS RSS ABS Email notification service
Education Services
 


Summarising Data

Mean

The mean of a numeric variable is calculated by adding together the values of all observations in a data set and then dividing by the number of observations in the set. It is often referred to as the average.

Thus:

Mean = sum of all the observations ÷ number of observations

For example, find the mean of these numbers 5, 3, 4, 5, 7, 6.

Mean = (5 + 3 + 4 + 5 + 7 + 6) ÷ 6
= 30 ÷ 6
= 5

Notice that the value of every member of the data set is used in calculating the mean.



Median

The median value corresponds to the middle observation when a data set is rearranged in increasing (ascending) order, by value.

Median = the middle value of a set of data

For example, data set A contains: 3, 7, 1, 9, 2, 5, 9.

Rearranged in ascending order it becomes: 1, 2, 3, 5, 7, 9, 9.

The middle number is 5 so, the median is 5.

Data Set B contains: 1, 3, 4, 5, 10, 12, 13, and also has a median of 5 although the values of the data vary considerably.
The position of the median can also be found by using the formula ( n + 1 ) ÷ 2 , where ‘n’ is the number of values in a set of ordered data.

After the data have been placed in ascending order:

For data set A: n = 7

So the position of the median = (7 + 1) ÷ 2
= 8 ÷ 2
= 4

The median is the fourth number, which has a value of 5.

The above example is for an odd number of observation, i.e. n = 7. The case where the number of observations is even, e.g. n = 8, requires an extra step. For example: if n = 8 then,

the position of the median= (8 + 1) ÷ 2
= 9 ÷ 2
= 4.5

This means that the position of the median lies between the fourth and fifth observations. To find the value of the median, add together the fourth and fifth observation and divide by two, e.g.

If the data set is:

1, 1, 4, 4, 8, 9, 9, 10

then the median is,

(4 + 8) / 2 = 7

The median value is decided by its location in the ordered data set and not because of its actual value. Notice that the values of the other members of the data set are not taken into consideration, only their position.

There are as many values above the median as there are below.

The median is usually calculated for numeric variables (may also be calculated for an ordinal nominal variable).


Mode

Mode is the only measure you can use when the data is categorical and has no order (for example: place of birth, favourite colour and hair colour). As the data set is not numbers, you cannot add and divide, so you cannot find a mean. The data set cannot be sorted from smallest to largest; so you cannot find the middle value, and therefore you cannot find a median.

For example, a group friends in Year 10 have the following hair colours:

red, brown, blonde, black, blonde, black, brown, brown, black, blonde, brown, brown, black.

HAIR COLOUR FREQUENCY

Red 11
Brown 55
Black 44
Blonde 33

The most common hair colour is brown so, the mode is brown.


Range

The range is a number that informs you of the spread of the data.

Range = maximum value – minimum value

For the following data set of student’s ages: 17, 15, 14, 16, 14, 15, 16, 12, 17, 13, 12, 17, 13, 16, 15

Maximum value
Minimum value
= 17
= 12

Range= maximum value – minimum value
= 17 – 12
= 5

The range of the student's ages is 5 years.


Quartiles

Quartiles divide data into four equal groups. Using the example of 15 students above, we have the following ordered data set:

12, 12, 13, 13, 14, 14, 15, 15, 15, 16, 16, 16, 17, 17, 17

We can divide this set into four equal sized groups with each group containing one quarter of the data:

  • The first quartile (Q1) is the value that 25% of the data is below.
  • The second quartile (Q2) is the value that 50% of the data is below. This is the same as the median.
  • The third quartile (Q3) is the value that 75% of the data is below.
Equation: Five number summary (quartiles)
In the example:Q1 = 13
Q2 = 15
Q3 = 16


Interquartile range

The interquartile range refers to the middle 50% of data. It is found by subtracting Q1 from Q3.

The interquartile range is an indicator of the spread of the data. It eliminates the influence of outliers since the highest and lowest quarters are removed.


Five Number Summary (quartiles)

This is a useful way to summarise data. It consists of:
  • the lowest value
  • the highest value
  • the first quartile (Q1)
  • the third quartile (Q3)
  • the second quartile (Q2).

The range can be found from the difference between the highest and lowest value. The median is the second quartile (Q2) and the interquartile range is the difference between the third and first quartiles (Q3 – Q1).


Box and Whisker Plot

A box and whisker plot (often called a ‘box plot’) can be used to show the interquartile range. The figure below shows a box and whisker plot for the example of student ages.
Equation: Box and whisker plot
Notice that a scale is drawn underneath. Box plots can be drawn horizontally or vertically.


Standard Deviation

Standard deviation (s) is a mathematical way of expressing the spread of a data set.

The standard deviation for a discrete variable made up of n observations is the positive square root of the variance and is defined by:
Equation: Standard deviation


Back to Step 3: Organising Data


© Commonwealth of Australia 2008

Unless otherwise noted, content on this website is licensed under a Creative Commons Attribution 2.5 Australia Licence together with any terms, conditions and exclusions as set out in the website Copyright notice. For permission to do anything beyond the scope of this licence and copyright terms contact us.