I watched this video over the weekend and wanted to share this very well done presentation by a Venture Capital (VC) firm with you — that’s why I love following VC’s (especially one’s who invest in Data/Analytics theme) since they tend to share some amazing insights on where the industry is going.
Abstract: “One person, in a literal garage, building a self-driving car.” That happened in 2015. Now to put that fact in context, compare this to 2004, when DARPA sponsored the very first driverless car Grand Challenge. Of the 20 entries they received then, the winning entry went 7.2 miles; in 2007, in the Urban Challenge, the winning entries went 60 miles under city-like constraints.
Things are clearly progressing rapidly when it comes to machine intelligence. But how did we get here, after not one but multiple “A.I. winters”? What’s the breakthrough? And why is Silicon Valley buzzing about artificial intelligence again?
From types of machine intelligence to a tour of algorithms, a16z Deal and Research team head Frank Chen walks us through the basics (and beyond) of AI and deep learning in this slide presentation.
“Machine Learning” is a subset of “Data Analysis” — it’s just one of the activities that you could apply to solve a data analysis problem, you just need to find a problem that can use machine learning wizardry! What kind of activities?, you say — well, to answer that we will need to step back and categorize what problems could be solved by Data Analysis. There are broadly three kinds of problems:
“What” Problems. Few example: What are my sales number for last quarter? Can we compare it to same quarter last year? Now, can we break it down by Regions and Product Categories? — you see all these questions could be answered by a querying your data stores or by your Business Intelligence platform. Yo do NOT need machine learning for this. Moving on…
“Why” Problems: Few example: Why did the customer cancel their contract? Why is the profit in region A declining Quarter over Quarter? You see this is little bit more challenging than “what” questions — you will need to structure the problem and pull data from multiple sources. Why did customer cancel? You may want to look at internal (e.g. customer complaints) and external (e.g. bankruptcy) data. Usually you won’t need to apply Machine Learning here — you might benefit in some cases where you “cluster” all churned customers and see if you can find some patterns but again Machine learning is not you primary tool here. Moving on…
“What’s next” problems: This what you have been waiting for — this is where Machine learning could be applied. Example: Which customer accounts will cancel their account this fiscal year? — This is where you train a machine learning algorithm to predict which customers will churn this year. Note that the work you did for “why” problems where you identified some characteristics of churned customers will still be applicable here — and that brings me to: Most organizations don’t usually jump from “What” to “What’s next” stage — every organization is at a different stage depending on their maturity and you can’t apply machine learning to every data analysis problem. Also, with more and more companies using “data” to gain competitive edge, if you are not using machine learning then chances are high that your competitor is and they may out-compete you and that’s why it’s important to continuously invest and reach the highest level — more and more companies and executives are realizing this and it’s a great thing for the data community!
To conclude: Depending on the analytics maturity of your organization and the business problem at hand, you might have to use Machine learning to solve a data analyis problem…And it never hurts to pick up Machine learning basics along with other data analysis skills that you might have.
A machine learning algorithm in Azure ML has few parameter settings that you can set — in this post, we will talk about 1) why you should NOT stick with default settings 2) How can “Tune model hyperparameters” module help you do so?
So first up, why you should not be using the default setting? The parameter settings that are applied to a model impacts the accuracy (or call it predictive power) of the algorithm…sometimes it may be significant and sometimes not but either case, you won’t know until you try changing the default values. In other words, by tuning the hyperparameters you could significantly boost your model’s performance!
Now, how do you go about setting the parameters such that it gives optimal performance? Let’s say that there are 3 parameters then that is 27 different combinations! How do you know which one is the best? You could dig a little deeper into the mechanism of how algorithms works and narrow down your list but that would still take some time. So, there should be a better way, right? Luckily there is: This where “Tune Model Hyperparameters” comes in! You can use it with any algorithm in Azure ML. This module helps you tune the hyper-parameters. There’s some things that you still need do like decide: Do you want the module to just try random n combinations? OR Do you want the module to try all combinations (fyi: this is compute-intensive operation!)? … AND you will have to decide what are you are optimizing for? Depending on the algorithm it would let you pick the evaluation metric that you want to optimize.
Now, there are some good articles already written that walks you through how to get about doing this so I am going to share these links with you:
Running the “Entire Grid” mode will slow down the training time for the model. You might want to make sure that it’s acceptable and the cost (longer training time) to benefit (better accuracy) is worth it for your case
When you are comparing algorithms to decide the best one that fits your problem, instead of comparing “model with default parameter settings” with each other, try comparing the “model with tuned hyper-parameters”
Next week, on Mar 15th, 2016, We at Business Analytics VC are hosting a webinar to help you dive a little bit deeper with azure machine learning and learn about building a model to predict customer churn. Even if you don’t use Azure, I think you can still benefit from learning about the use-cases and the framework to solve this problem. You can register using this URL:
If you’re getting started with Data Science & Machine Learning then I think this would be a great resource for you. This “cheat sheet” helps you select the “algorithm” to test depending on the problem you are trying to solve and the data-set that you have.
Why is “Doing Data Science at Twitter” a great read?
This is an insider’s perspective from someone who is working at a company that I classify as having the highest level of analytics maturity — In other words, Twitter is known to apply knowledge gained from data science into their products and business processes.
It’s also important to recognize that every company is different and the analytics/data-science tools/techniques/processes that would be implemented would also vary based on the analytics maturity — I love that this was one of the key insights shared in this article.
Also, the article talks about two types of data scientists…I thought it was great way to classify them because there’s a lot of confusion in the industry around what a Data scientist does. With that, Here’s the URL:
Description: The world is becoming more efficient. Today, seventy percent of the companies that graced the Fortune 1000 list a mere decade ago have vanished. Agility and survival are function of innovation, culture, and the ability to predict the future. To that end, data analytics offers a lifeline, a means of survival that will drive productivity and continue to disrupt and redefine business. However, the resources available to today’s business leaders sit on two vastly different ends of the spectrum. On the one hand, highly technical academic resources and on the other largely fluffy overviews of value propositions and potentials. The state of the industry shouldn’t be surprising. The same dynamics played out in early years of the internet. Software providers, technical leaders, and consulting firms greatly benefit from mystifying the world of data analytics into something that is incomprehensible. That lack of conceptual understanding is incredibly risky and propels the cost of analytics initiatives upwards. This webcast aims to bridge that gap between the technical data scientists and business leaders. Ultimately, this understanding will help to: – Connect the strategic goals of business leaders with the capabilities of technical advisers – Focus investments and initiatives within analytics and technology – Distill immensely complex subject matter into comprehensible examples – Accelerate the path to value and increase the ROI of analytics initiatives
Alex is a Predictive Analytics Architect in the Oil and Gas industry with a passion for distilling complexity into insights and evangelizing data science. His work has been featured on KDNuggets and he was recognized by DataScienceCentral as a top 180 blogger in 2014.
In this post, I’ll list few examples from various industries to help you differentiate between business intelligence and data science problems.
Sometime back, I blogged about “Business Analytics Continuum” and in the post we saw that Every Organization has DATA but they use their business data at different levels because of their maturity level. Excel (or other transactional reporting tools) is usually the starting point for any organization – it helps them see WHAT happened. They advance to the next stage, where they get capabilities to slice and dice their data – To find out WHY – and usually this capability is delivered using Business Intelligence tools & techniques. Once the data culture spreads – Thanks to a successful Business Intelligence project – then they soon start to outgrow their business intelligence capabilities by asking problems that need predictive capabilities. This is advanced analytics and Data Sciencestage. To that end, here are 5 examples to help you differentiate between business intelligence and data science problems:
Business Intelligence.(WHAT & WHY)
Data Science & advanced analytics.
How many bikes did we rent in Q3 2014? How does that compare to Q3 2013?
What is the trend of total bike rentals at week level? Can you break it down by geography?
Can you predict bike rentals on an hourly basis?
How many customers have a credit risk of ‘C’?
Can you rank customers by their payments due amount that have a credit risk ‘C’?
Can you predict the credit risk of the customer during contract negotiations stage?
Customer relationship management
How many account cancellations occurred this year (broken down by month and customer segmentation)?
How does percentage of account cancellations this year compare to that previous year?
Can you predict customer churn?
What is the trend of % of flight delayed this year?
Can you break down flight delays this year by their reasons?
Can you predict whether a scheduled flight will be delayed by more than 15 minutes?
What is the customer satisfaction % trend this year?
What is the customer satisfaction % broken down by customer segments and product segments?
Can you classify a customer feedback comment into “positive”, “negative” or “neutral”?