Let’s look at four datasets which have identical statistical properties: Here’s the DATA:
Here’s their statistical properties:
Mean of x in each case
Variance of x in each case
Mean of y in each case
7.50 (to 2 decimal places)
Variance of y in each case
4.122 or 4.127 (to 3 decimal places)
Correlation between x and y in each case
0.816 (to 3 decimal places)
Linear regression line in each case
y = 3.00 + 0.500x
They look identical – don’t they? BUT let’s visualize the data:
Only visualizing data made it possible for us to understand and appreciate the “difference” between data-sets. Looking at just statistical properties made them appear “similar” – moral of the story: Visualize data! Graph data along with investigating statistical properties.
One of the key thing I’ve learned is importance of differentiating the concepts of “Data Reporting” and “Data Analysis”. So, let’s first see them visually:
Here’s the logic for putting Data Reporting INSIDE Data Analysis: if you need to do “analysis” then you need reports. But you do not have to necessarily do data analysis if you want to do data reporting.
From a process standpoint, Here’s how you can visualize Data Reporting and Data Analysis:
Let’s thing about this for a moment: Why do we need “analysis”?
We need it because TOOLS are really great at generating data reports. But it requires a HUMAN BRAIN to translate those “data points/reports” into “business insights”. This process of seeing the data points and translating them into business insights is core of what is Data Analysis. Here’s how it looks visually:
Note after performing data analysis, we have information like Trends and Insights, Action items or Recommendations, Estimated impact on business that creates business value.