| Module 3: Interpreting Data
5.3 Scatter plots
5.3.4 The correlation coefficient, r
The strength of linear association can be measured using a number - the correlation co-efficient. Correlation measures the strength and direction of a linear relationship. Rather than use subjective word descriptors such as "strong positive correlation", r gives a numerical measure.
The correlation coefficient, r, has a specific range of values:
Note that:
- r never, ever lies outside this range, therefore r = 2 is a nonsense answer whose only explanation can be "I made an arithmetic error".
- r = 1 is perfect positive correlation and all the data points lie exactly on a straight line with positive gradient.
- r = -1 likewise is perfect negative correlation.
How do I find the r value for a data set?
Steps you need to follow:
1. draw the scatterplot;
2. draw the trend line which describes the direction of the data;
3. evaluate how closely the cloud of data points clusters around the line;
4. determine what r value and what word descriptor best suits the data cloud.
The following diagram has a number line of r values to help you assigning the numbers and the word descriptors.
Consider the following examples of scatterplots.
These have the cloud of data points and a trend line fitted to show the direction of the data.
It would be helpful for you to memorise these to assist you describe your own data sets:
zero correlation | weak negative correlation
r = -0.3 |
| |
moderate positive correlation
r = 0.5 | moderate negative correlation
r = -0.6 |
| |
strong positive correlation
r = 0.8 | strong positive correlation
r = 0.95 |
| |
|
And now consider some negative gradients:
|
weak negative correlation
r = -0.40 | moderate negative correlation
r = -0.65 |
| |
moderate negative correlation
r = -0.75 | strong negative correlation
r = -0.85 |
| |
 |
|  |  |
 | Test your knowledge
|  |  |
 |
|  |  |
 | Question
What does the scatterplot indicate?
|  | Answer |
 |
- No association indicates that five separate species are present.
- An outlier can be observed indicating the presence of two separate species.
- There is a strong positive association indicating the specimens belong to one species.
- It is not possible to make a statement about the number of species present from the scatterplot.
|  | Click here for answers |
 |  |  |  |
Assets and Incomes for 20 US Banks (1973)
 |
|  |  |
 | Test your knowledge
|  |  |
 |
|  |  |
 | Question
Describe the relationship between the income and assets of the 20 largest banks in US.
|  | Answer |
 |
- Positive
- Negative
|  | Click here for answers |
 |  |  |  |
 |
|  |  |
 | Test your knowledge
|  |  |
 |
|  |  |
 | Question
Describe the form of the relationship
|  | Answer |
 |
- linear
- curved
- seasonal
- no relationship
|  | Click here for answers |
 |  |  |  |
 |
|  |  |
 | Test your knowledge
|  |  |
 |
|  |  |
 | Question
How strong is the relationship?
|  | Answer |
 |
- No relationship
- A banks income can be predicted accurately from their assets
- There is some relationship between assets and income for a bank.
|  | Click here for answers |
 |  |  |  |
 |
|  |  |
 | Test your knowledge
|  |  |
 |
|  |  |
 | Question
The data point (150, 12) could be described as:
|  | Answer |
 |
- influential
- outlier
- mistake
- cluster
|  | Click here for answers |
 |  |  |  |
 |
|  |  |
 | Test your knowledge
|  |  |
 |
|  |  |
 | Question
The data points (175, 36) (225, 50) and (270, 42) can be described as:
|  | Answer |
 |
- outliers
- mistakes
- irrelevant
- influential
|  | Click here for answers |
 |  |  |  |
TEST EXAMPLES
Appropriate frames / boxes
Estimate the strength of association (correlation coefficient) for the following scatterplots:
 |  |
If you said r = 0 that is a good estimate – the exact value is r = -0.08 | If you said somewhere from r = 0.1 to r = 0.3 that is a good estimate – the exact value is r = 0.22 |
 |  |
If you said somewhere from r = -0.3 to r = -0.5 that is a good estimate – the exact value is r = -0.45 | If you said somewhere from r = 0.3 to r = 0.5 that is a good estimate – the exact value is r = 0.38 |
 |  |
If you said somewhere from r = 0.8 to r = 0.9 that is a good estimate – the exact value is
r = 0.87 | If you said somewhere around r = -0.95 that is a good estimate – the exact value is
r = -1.00 exactly |
|  |
If you said somewhere from r = 0.5 to r = 0.7 that is a good estimate – the exact value is r = 0.63 | If you said somewhere from r = - 0.65 to r = -0.8 that is a good estimate – the exact value is r = -0.75 |
|  |