A great example to show power of visualizing data: Anscombe’s Quartet Table

Standard

Let’s look at four datasets which have identical statistical properties:
Here’s the DATA:

ansombe quarter data visualizeHere’s their statistical properties:

Property Value
Mean of x in each case 9 (exact)
Variance of x in each case 11 (exact)
Mean of y in each case 7.50 (to 2 decimal places)
Variance of y in each case 4.122 or 4.127 (to 3 decimal places)
Correlation between x and y in each case 0.816 (to 3 decimal places)
Linear regression line in each case y = 3.00 + 0.500x

They look identical – don’t they? BUT let’s visualize the data:

Anscombe quarter data visualizationOnly visualizing data made it possible for us to understand and appreciate the “difference” between data-sets. Looking at just statistical properties made them appear “similar” – moral of the story: Visualize data! Graph data along with investigating statistical properties.

Source: Anscombe’s quartet

Data Reporting ≠ Data Analysis

Standard

One of the key thing I’ve learned is importance of differentiating the concepts of “Data Reporting” and “Data Analysis”. So, let’s first see them visually:

data analysis and data reporting

Here’s the logic for putting Data Reporting INSIDE Data Analysis: if you need to do “analysis” then you need reports. But you do not have to necessarily do data analysis if you want to do data reporting.

From a process standpoint, Here’s how you can visualize Data Reporting and Data Analysis:

data analysis and data reporting process

Let’s thing about this for a moment: Why do we need “analysis”?

We need it because TOOLS are really great at generating data reports. But it requires a HUMAN BRAIN to translate those “data points/reports” into “business insights”. This process of seeing the data points and translating them into business insights is core of what is Data Analysis. Here’s how it looks visually:

Data analysis Data Reporting

Note after performing data analysis, we have information like Trends and Insights, Action items or Recommendations, Estimated impact on business that creates business value.

Conclusion:

Data Reporting ≠ Data Analysis

Adding a TrendLine to a Time Series Line Chart in Excel 2010:

Standard

I was playing w/ a time series data set in Excel 2010 and learned how to add a Trend-line and in this blog post, I’ll share how I added it:

First up, How is Trend-line useful? Here are few answers:
– It helps us see how data is changing over time, in other words, it helps us find “trends”
– It helps us forecast future.

With that, here is the chart without Trend-line:on time flight arrivals excel without trendline

Now let’s add the trend-line and you’ll be able to compare on your own how Trend-line makes it easier to spot “trends”. Here are the steps:

1. select the line > right-click > add trend line

add trendline time series

2. configure the trend-line options

trend line configuration options excel

3. I also changed the line style

4. And Here’s the chart w/ trend-line

american airlines on time flight arrivals excel with trendline

Conclusion:

In this post, we saw how to add trend-line in the time series chart in excel 2010

Statistics 101: Nominal, Ordinal, Interval, Ratio Data

Standard

If you work with any statistical analysis tool, sometimes you may have run into configuring the data into either of these following categories: Nominal, Ordinal, Interval, Ratio

Here is what each term means:

Nominal Simply names or call them set of characters Example: Full name, fruits, cars, etc
Ordinal Nominal + They have order Example: Small, medium, big
Interval Ordinal + the intervals between each value are equally split Example: temperature in Fahrenheit scale:10 20 30 etc

Note that 20F is not twice as cold as 40F. So multiplication does not make sense on Interval data. But addition and subtraction works. Which brings us to next point: Ratio

Ratio Interval + multiplication makes sense Weight: 60KG, 120KG.120 KG = 2 * 60 KG

I hope the examples are of help when you are working with statistical analysis tools and need to categorize the data.