Let’s look at four datasets which have identical statistical properties:
Here’s the DATA:
Here’s their statistical properties:
Property | Value |
---|---|
Mean of x in each case | 9 (exact) |
Variance of x in each case | 11 (exact) |
Mean of y in each case | 7.50 (to 2 decimal places) |
Variance of y in each case | 4.122 or 4.127 (to 3 decimal places) |
Correlation between x and y in each case | 0.816 (to 3 decimal places) |
Linear regression line in each case | y = 3.00 + 0.500x |
They look identical – don’t they? BUT let’s visualize the data:
Only visualizing data made it possible for us to understand and appreciate the “difference” between data-sets. Looking at just statistical properties made them appear “similar” – moral of the story: Visualize data! Graph data along with investigating statistical properties.
Source: Anscombe’s quartet