As a data professional, you would invariably end up spending a lot of time on data cleaning & transformation and a lot of times, you might be doing your work in Excel — if so, then check out Power Query if you haven’t already! It will save you a LOT of time and unlock Jedi powers that you didn’t know you had!
if you are using a Mac — and there’s a lot of data scientist and data analyst who are on this platform then you are unfortunately out of luck! So for Mac users out there, I had shared this feedback which has 50 comments & 337 votes (as of 6/16/17) on the official Power BI ideas site; If you are one of the Mac users, then I encourage you to check it out and vote! Microsoft does take it seriously and their roadmap is heavily influenced by ideas site.
If you are a data science professional and haven’t heard about bots, you will soon! Most of the big vendors (Microsoft, Qlik, etc) have started adding capabilities and have shown some signs of serious product investments for this category. So, let’s step back and reflect how will bot impact the adoption of data platforms? and why you should care?
So, let’s start with this question: What do you need to drive a data-driven culture in an organization? You need to focus on three areas to be successful:
Data (you need to access from multiple sources, merge/join it, clean it and store it in cental location)
Modeling Layer/Algorithm layer (you need to add business logic, transform data and/or add machine learning algorithm to add business value to your data)
Workflow (you need to embed data & insights in business user’s workflow OR help provide data/insights when they in their decision-making process)
Over the past few years, there was a really strong push for “self-service” which was good for the data professionals. A data team builds a platform for analysts and business users to self-serve whenever they needed data and so instead of focusing on one-off requests, the team could focus on continuously growing the central data platform and help satisfy a lot of requests. This is all great. Any business with more than 50-ish employees should have a self-service platform and if they don’t then consider building something like that. All the jazz comes after this! Data Science, Machine learning, Predictive modeling etc would be much easier if you have a solid data platform (aka data warehouse, operational data store) in place! Of course, I am talking at a pretty high-level and there are nuances and details that we could go into but self-service were meant for business users and power users to “self-serve” their data needs which is great!
Now, there is one problem with that! Self-service platforms don’t do a great job at the third piece which is “workflow” — they are not embedded in every business user’s workflow and management team doesn’t always get the insights when they need to make the decision. Think of it this way, since it’s self-serving platform, users will think of it to react to business problems and might not have the chance to be pro-active.Ok, That may seem vague but let me give you an example.
Let’s a take a simple business workflow of a sales professional.
She has a call coming up with one of her key customers since their account is about to expire. So she logs into the CRM (customer relationship management) software to learn about the customer. She looks at some information in the CRM system and then wants to learn about the product usage by that customer over last 12 months.
She opens a new browser tab and logs into the data platform. Takes about 10 minutes to navigate to data model/app that has that information. Filters the data to the customer of interest and a chart comes up.
Goes back to the CRM system. Needs something else so goes back to the data platform. That searching takes another 10 minutes!
Wasn’t that painful? Having to switch between multiple applications and wasting 10 minutes each time just to answer a simple question. So business users do this if this is critical but they will ignore your platform if it’s not business-critical.
So to improve data-driven culture you need to think about your business users workflow and think of ways to integrate data/insights. This is probably one of the most under-rated things that has exponential pay-off’s!
So how do bots fit into all of this? So we talked about how workflows are important, right? To address this, tools had data alerts and embedded reports feature which works too but now we have a new thing called “bots” which enables deeper integration and helps you embed data/insights to a business user’s workflow.
Imagine this: In the previous example, instead of logging into data platform, the business user could just ask a question on one of the chat applications: show me the product usage of customer x. And a chart shows up. Boom! Saved 10 minutes but more importantly, by removing friction and adding delight, we gained a loyal user who is going to be more data-driven than ever before!
This is not fiction! Here’s a slack bot that a vendor built that does what I just talked about:
So to wrap up, I think bots could have a tremendous impact on the adoption of the data platforms as it enables data professionals to work on the third pillar called “workflow” to further empower the business users.
And the increase in data consumption is great for both data engineers and data scientists. it’s great for data engineers because people might ask more questions and you might have to integrate more data sources. It’s great for data scientists because if more people ask questions then over time, they will get to asking bigger and bolder questions and you will be looped into those projects to help solve those.
What do you think? Do you think bot will impact the adoption of data platforms? If so, how? if not, why not? I am looking forward to hearing about what you have to say! please add your comments below.
As a data scientist, I am not dissatisfied. I love what I do!
But I might have gotten lucky since I got into this for the right reasons. I was looking for a role that had a little bit of both tech & business and so few years back, Business Intelligence and Data Analysis seemed like a great place to start. So I did that for a while. Then industry evolved and the analytics maturity of the companies that I worked also evolved and so worked on building predictive models and became what they now call “Data scientist”.
It doesn’t mean that data science is the right role for everyone.
One of my friends feels that it’s not that “technical” and doesn’t like this role. He is more than happy with data engineer role where he gets to build stuff and dive deeper into technologies.
One of my other friends doesn’t like that you don’t own business/product outcomes and prefers a product manager role (even though he has worked as a data analyst for a while now and is working on transitioning away).
So, just based on the empirical data that I have, data science might not be an ideal path for everyone.
If you create bunch of reports and help answer what happened— then try to help business users with why it happened. [Example: Instead of just sending website traffic info, add why the traffic spikes (up/downs) are happening]
If you are working on building bunch of models that answer why questions then try to help build predictive models next [Example: You have been working on a model that helped you answer why customers churned. Now built upon that and predict which customers will churn next]
If you do analytics and data science well and are already answering what, why, what’s next questions and you’re killing it! Then figure out how can you help business owners take action. Or make it easier than ever before to take actions on your data/recommendations.
Other answers for questions are directly/indirectly covered if you do this:
You will have to pick the right tool for the job
You will have to continuously keep learning (by taking online courses and/or you-tube)
Don’t just be a data analyst, be a thought partner to business owners and if possible, transition into role that help you own business outcomes.
There are lot of ways to apply a CLV (customer lifetime value) model. But I hadn’t seen a single document that would summarize all of them — Until I saw this: http://srepho.github.io/CLV/CLV
If you are building a CLV model, one of first things that you might want to figure out is whether you have a contractual model or non-contractual model. And then figure out which methodology would work best for you. Here are 8 methods that were summarized in the link that I shared with you:
It does affect SSRS adoption but SSRS (sql server reporting service) still has a place as long as there’s need for printer-friendly reporting and self-service vendors don’t have a good solution to meet this need.
Also, SSRS is great for automating operational reports that sends out emails with raw data (list of customers, products, sales transaction etc).
I advocate an analytics strategy where we think about satisfying data needs using “self-service”-first (Power BI, tableau, qlik) but if thats not the optimal solution (for cases like need to print it, I just need you to send me raw data in excel, etc) then I’ll mark it as SSRS project. And this architecture is supported by a central data model (aka operational data store, data mart, data warehouse) which makes it much easier to swap in/out any reporting tools that we need and we are not locked in by one vendor.
About 10–20% data requests that I see are SSRS projects and if the self-service platforms start adding features that compete with SSRS, I know I would start using those capabilities and phase out SSRS. But if that doesn’t happen, I will continue using SSRS 🙂
Data cleaning takes up a lot of time during a data science process; it’s not necessarily a bad thing and time spent on cleaning data is worthwhile in most cases; To that end, I was researching some framework that might help me make this process a little bit faster. As a part of my research, I found the Journal of statistical software paper written by Hadley Wickham which had a really good framework to “tidy” data — which is part of data cleaning process.
Author does a great job of defining tidy data:
1. Each variable forms a column.
2. Each observation forms a row.
3. Each type of observational unit forms a table.
And then applying it to 5 examples:
1. Column headers are values, not variable names.
2. Multiple variables are stored in one column.
3. Variables are stored in both rows and columns.
4. Multiple types of observational units are stored in the same table.
5. A single observational unit is stored in multiple tables
It depends on your target industry & where they are in their life-cycle.
It has four stages: Startup, Growth, Maturity, Decline.
Generalization is great in earlier stages. If you are targeting jobs at startups; generalize. You should know enough about lot of things.
T-shaped professionals are great for Growth stage. They specialize in something but still know enough about lot of things. E.g. Sr Growth/Marketing Analyst. Know enough about analytics & data science to be dangerous but specializes in marketing.
Specialization is great for mature industries. They know a lot about few things. E.g. Statisticians in an Insurance industry. They have made careers out of building risk models.
This was asked on Reddit: Any advice for moving into data science from business intelligence?
Here’s my answer:
I come from “Business Intelligence” background and currently work as Sr. Data Scientist. I found that you need two things to transition into data science:
Data Culture: A company where the data culture is such that managers/executives ask big questions that need a data science approach to solve it. If your end-consumers are still asking bunch of “what” questions then your company might NOT be ready for data science. But if your CEO comes to you and says “hey, I got the customer list with the info I asked for but can you help me understand which of these customers might churn next quarter?” — then you have a data science problem at hand. So, try to find companies that have this culture.
Skills: And you need to upgrade your skills to be able to solve data science problems. BI is focused too much on technology and automation and so may need to unlearn few things. For example: Automation is not always important since you might work on problems where a model is needed to predict just a couple of times. Trying to automate wouldn’t be optimal in that case. Also, BI relies heavily on tools but in Data science, you’ll need deeper domain knowledge & problem-solving approach along with technical skills.
Also, I personally moved from BI (as a consultant) -> Analytics (as Analytics Manager) -> Data science (Sr Data Scientist) and this has been super helpful for me. I recommend to transition into Analytics first and then eventually breaking into data science.