In this post, I’ll list few examples from various industries to help you differentiate between business intelligence and data science problems.
Sometime back, I blogged about “Business Analytics Continuum” and in the post we saw that Every Organization has DATA but they use their business data at different levels because of their maturity level. Excel (or other transactional reporting tools) is usually the starting point for any organization – it helps them see WHAT happened. They advance to the next stage, where they get capabilities to slice and dice their data – To find out WHY – and usually this capability is delivered using Business Intelligence tools & techniques. Once the data culture spreads – Thanks to a successful Business Intelligence project – then they soon start to outgrow their business intelligence capabilities by asking problems that need predictive capabilities. This is advanced analytics and Data Sciencestage. To that end, here are 5 examples to help you differentiate between business intelligence and data science problems:
Business Intelligence.(WHAT & WHY)
Data Science & advanced analytics.
How many bikes did we rent in Q3 2014? How does that compare to Q3 2013?
What is the trend of total bike rentals at week level? Can you break it down by geography?
Can you predict bike rentals on an hourly basis?
How many customers have a credit risk of ‘C’?
Can you rank customers by their payments due amount that have a credit risk ‘C’?
Can you predict the credit risk of the customer during contract negotiations stage?
Customer relationship management
How many account cancellations occurred this year (broken down by month and customer segmentation)?
How does percentage of account cancellations this year compare to that previous year?
Can you predict customer churn?
What is the trend of % of flight delayed this year?
Can you break down flight delays this year by their reasons?
Can you predict whether a scheduled flight will be delayed by more than 15 minutes?
What is the customer satisfaction % trend this year?
What is the customer satisfaction % broken down by customer segments and product segments?
Can you classify a customer feedback comment into “positive”, “negative” or “neutral”?
Session Abstract: Microsoft has introduced a new technology for developing analytics applications in the cloud. The presenter has an insider’s perspective, having actively provided feedback to the Microsoft team which has been developing this technology over the past 2 years. This session will 1) provide an introduction to the Azure technology including licensing, 2) provide demos of using R version 3 with AzureML, and 3) provide best practices for developing applications with Azure Machine Learning.
Speaker BIO: Mark is a consultant who provides enterprise data science analytics advice and solutions. He uses Microsoft Azure Machine Learning, Microsoft SQL Server Data Mining, SAS, SPSS, R, and Hadoop (among other tools). He works with Microsoft Business Intelligence (SSAS, SSIS, SSRS, SharePoint, Power BI, .NET). He is a SQL Server MVP and has a research doctorate (PhD) from Georgia Tech.
Classification algorithms are commonly used to build predictive models. Here’s what they do (simplified!):
Now, here’s the difference between Multi Class and Two Class:
if your Test Data needs to be classified into two classes then you use a two-class classification model.
1. Is it going to Rain today? YES or NO
2. Will the buyer renew his soon-to-expire subscription? YES or NO
3. What is the sentiment of this text? Positive OR Negative
As you can see from above examples the test data needs to be classified in two classes.
Now, look at example #3 – What is the sentiment of the text? What if you also want an additional class called “neutral” – so now there are three classes and we’ll need to use a multi-class classification model. So, If your test data needs to be classified into more than two classes then you use a multi-class classification model.
1. Sentiment analysis of customer reviews? Positive, Negative, Neutral
2. What is the weather prediction for today? Sunny, Cloudy, Rainy, Snow
I hope the examples helped, so next time you have to choose between multi class and two class classification models, ask yourself – does the problem ask you to predict two classes or more? based on that, you’ll need to pick your model.
Data Explorer let’s you “Explore” (search) for web-based public data. This is a great way to combine data that you may have in your data-sources with public data sources for data analysis purposes. Sometimes your data might not tell you the reason behind the observed trends, so when that happens – you can try to see if a public data-set might give you the much-needed context. Let me give you an Example before we start hands-on w/ data explorer so that you have better understanding of importance of public datasets. Here’s a sample that I found here. So, Here’s a demo:
An auto company is seeing sales trends of Hybrid cars and SUV’s from the sales data-sources. But what is the reason behind that? company data does not show that. Someone hypothesizes that it might be because of gas prices. So they test out the hypothesis by combining gas prices information available via public data. And turns out gas prices might be the driving force of sales trends! SEE:
if the gas prices increase, then the sale of SUV go down and the sale of Hybrids go up:
You know that public data can be helpful! So how can you search for public data-sets? Well, You can manually search online, ask someone, browse through public data repositories like azure data market (and other data markets), there’s also a public data search engine! OR you can directly search for them via Data Explorer.
Here are the steps:
1) Excel 2010/2013 > Data Explorer Tab > Online Search > type “Tallest Buildings”
2) I selected one of the data-sets that said “Tallest completed building…. ”
3) Now let’s do some filtering and shaping. Here are the requirements:
– Hide columns: Image, notes & key
– clean columns that has heights data
– Show only city name in location
OK, let’s get to this one by one!
4) Hiding Columns:
Click on Filter & Shape button from the Query Settings:
Select Image Column > Right Click > Hide:
Repeat the steps for notes & key column.
Click on DONE
5) clean column that has heights data.
Click on Filter & Shape to open the query editor
A) let’s rename it. Select column > Right Click > Rename to Height > press ENTER
B) let’s remove the values in brackets. Select Column > right click > split column > By delimiter > At each occurrence of the delimiter > Custom and enter “(” > OK
This should transform the data like this:
Hide height.2 and rename the height.1 to height
Click on DONE
6) Let’s just have city names in the location column
click on Filter & shape to load query editor:
A) select location > right click > split column > by delimiter > Custom – Enter: ° in the text box like this:
One of the key thing I’ve learned is importance of differentiating the concepts of “Data Reporting” and “Data Analysis”. So, let’s first see them visually:
Here’s the logic for putting Data Reporting INSIDE Data Analysis: if you need to do “analysis” then you need reports. But you do not have to necessarily do data analysis if you want to do data reporting.
From a process standpoint, Here’s how you can visualize Data Reporting and Data Analysis:
Let’s thing about this for a moment: Why do we need “analysis”?
We need it because TOOLS are really great at generating data reports. But it requires a HUMAN BRAIN to translate those “data points/reports” into “business insights”. This process of seeing the data points and translating them into business insights is core of what is Data Analysis. Here’s how it looks visually:
Note after performing data analysis, we have information like Trends and Insights, Action items or Recommendations, Estimated impact on business that creates business value.
And now, the 12 one hour sessions ranging from data visualization, predictive analytics to Big Data are online for you to watch! They also serve as “Trailer” for what you can expect at the PASS Business Analytics conference!