Australian Bureau of Statistics

Rate the ABS website
ABS Home
ABS @ Facebook ABS @ Twitter ABS RSS ABS Email notification service
Understanding statistics
 





Module 3: Interpreting Data

5.3 Scatter plots

5.3.4 The correlation coefficient, r

The strength of linear association can be measured using a number - the correlation co-efficient. Correlation measures the strength and direction of a linear relationship. Rather than use subjective word descriptors such as "strong positive correlation", r gives a numerical measure.

The correlation coefficient, r, has a specific range of values:

range of values for r, -1 to +1

Note that:
  • r never, ever lies outside this range, therefore r = 2 is a nonsense answer whose only explanation can be "I made an arithmetic error".
  • r = 1 is perfect positive correlation and all the data points lie exactly on a straight line with positive gradient.
  • r = -1 likewise is perfect negative correlation.
How do I find the r value for a data set?

Steps you need to follow:
1. draw the scatterplot;
2. draw the trend line which describes the direction of the data;
3. evaluate how closely the cloud of data points clusters around the line;
4. determine what r value and what word descriptor best suits the data cloud.
The following diagram has a number line of r values to help you assigning the numbers and the word descriptors.


diagram of r strength values


Consider the following examples of scatterplots.

These have the cloud of data points and a trend line fitted to show the direction of the data.

It would be helpful for you to memorise these to assist you describe your own data sets:



zero correlation
weak negative correlation

r = -0.3

moderate positive correlation

r = 0.5

moderate negative correlation

r = -0.6

strong positive correlation

r = 0.8

strong positive correlation

r = 0.95

 

And now consider some negative gradients:

 

weak negative correlation

r = -0.40

moderate negative correlation

r = -0.65

moderate negative correlation

r = -0.75

strong negative correlation

r = -0.85


Scenario (Moore, 1995)

Archaeopteryx is an extinct animal that possessed both scales and feathers and at one stage was thought to be the 'missing link' between lizards and birds. Only six fossil specimens exist and they vary greatly in size. As a result, there has been a lot of discussion about whether the fossils all belong to one species or to different species. In order to help answer this question, data from the length (cm) of the femur (a leg bone) was plotted against the length of the humerus (a bone in the arm) on a scatter plot. Data were available for five of the specimens.

Comment: If the specimens belong to the same species and the differences are due to differences in size because of age, then the points should show a positive (but not necessarily linear) relationship. If any of the plotted points was an outlier from the bivariate pattern shown by the other points, this might suggest (but not prove) that the point represented a specimen from a different species.





Test your knowledge


Question

What does the scatterplot indicate?

Answer
  1. No association indicates that five separate species are present.
  2. An outlier can be observed indicating the presence of two separate species.
  3. There is a strong positive association indicating the specimens belong to one species.
  4. It is not possible to make a statement about the number of species present from the scatterplot.
Show details for Click here for answersClick here for answers

Assets and Incomes for 20 US Banks (1973)


Scenario

1969-1979 Assets and Liabilities of all Commercial Banks in the United States (H.8)


bank income and assets scatterplot




Test your knowledge


Question

Describe the relationship between the income and assets of the 20 largest banks in US.
Answer
  1. Positive
  2. Negative
Show details for Click here for answersClick here for answers



Test your knowledge


Question

Describe the form of the relationship
Answer
  1. linear
  2. curved
  3. seasonal
  4. no relationship
Show details for Click here for answersClick here for answers



Test your knowledge


Question

How strong is the relationship?
Answer
  1. No relationship
  2. A banks income can be predicted accurately from their assets
  3. There is some relationship between assets and income for a bank.
Show details for Click here for answersClick here for answers



Test your knowledge


Question

The data point (150, 12) could be described as:
Answer
  1. influential
  2. outlier
  3. mistake
  4. cluster
Show details for Click here for answersClick here for answers



Test your knowledge


Question

The data points (175, 36) (225, 50) and (270, 42) can be described as:
Answer
  1. outliers
  2. mistakes
  3. irrelevant
  4. influential
Show details for Click here for answersClick here for answers

TEST EXAMPLES
Appropriate frames / boxes
Estimate the strength of association (correlation coefficient) for the following scatterplots:
If you said r = 0 that is a good estimate – the exact value is r = -0.08
If you said somewhere from r = 0.1 to r = 0.3 that is a good estimate – the exact value is r = 0.22
If you said somewhere from r = -0.3 to r = -0.5 that is a good estimate – the exact value is r = -0.45
If you said somewhere from r = 0.3 to r = 0.5 that is a good estimate – the exact value is r = 0.38
If you said somewhere from r = 0.8 to r = 0.9 that is a good estimate – the exact value is

r = 0.87

If you said somewhere around r = -0.95 that is a good estimate – the exact value is

r = -1.00 exactly

If you said somewhere from r = 0.5 to r = 0.7 that is a good estimate – the exact value is r = 0.63
If you said somewhere from r = - 0.65 to r = -0.8 that is a good estimate – the exact value is r = -0.75


Previous Page



© Commonwealth of Australia 2008

Unless otherwise noted, content on this website is licensed under a Creative Commons Attribution 2.5 Australia Licence together with any terms, conditions and exclusions as set out in the website Copyright notice. For permission to do anything beyond the scope of this licence and copyright terms contact us.