|Module 3: Interpreting Data
5.3 Scatter plots
5.3.3 How can you interpret a scatter plot?
To read a scatter plot you need to look for the overall pattern. This tells you something about the direction, form and strength of the relationship.
Positive gradient: when the larger values of the horizontal (explanatory) variable are associated with larger values of the vertical (response) variable. As the explanatory variable increases, so does the response variable. Can you see how the data, as we move from left to right, are gradually rising?
Negative gradient: when the larger values of the explanatory variable are associated with smaller values of the response variable. As the explanatory variable increases, the response variable decreases. Can you see how the data, as we move from left to right, are gradually decreasing?
[In both cases we always use a consistent method - "explanatory variable increases" means that we move from left to right - what mathematicians call 'moving in the positive direction'.]
Notice that in each of the following two diagrams a trend line has been superimposed on the scatterplot - it helps gives an overall view of the direction in which the data points are sitting.
We need to know whether there is association or not, and whether it is linear or not. The relationship might be linear or curved or there might be no underlying form. In this course we will mainly concentrate on linear relationships, but we must be aware of the existence of non-linear ones.
The strength of the pattern is related to how tightly clustered the points are around the underlying form. We often use phrases like those following to describe the strength of the relationship, whether negative or positive. These phrases are of course, subjective.
(near) zero correlation
"moderate" positive correlation
"strong" positive correlation
"moderate" negative correlation
"strong" negative correlation
(d) Outliers and influential points
You can also look for individual points that fall outside the overall pattern of the scatter plot. Outliers can have a big influence on correlation. These should be examined (as far as possible) to determine whether these are real, or some kind of data error. It is quite common for a researcher to perform two analyses - the first analysis with the outlier remaining in the data set, the second with it removed.
The implications of removing/retaining the outlier must be clearly stated (it is unethical to simply erase a data point because it is not in the mainstream pattern!). Reasons and justification for any action must be clearly enunciated.
If the blue outlier were to be removed, we would have a data set with a high level of association. As it is, the outlier has a significant effect on the level of association.
|Influential points lie in the same direction as the major part of the data set, but are a long way removed.
For the graph at left questions would have to be asked as to why there is a gap, and whether there are special characteristics causing the two clusters to arise.
Note that changes in the scale of a graph do not change the strength of a pattern. Below are two scatter plots of the same data each drawn using different scales. Changes in the relative scale might appear to change the strength of the pattern but note that the line showing the best fit for the trend of the relationship is similar in both cases.