What are the must-know software skills for a career in data analytics after an MBA?


SQL, Excel & Tableau-like tools are good enough to start. Then add something like R eventually. And then there are tools that are specific to the industry – example: Google Analytics for the tech industry.

Other than that, you should know what do with these tools. You need to know following concepts and continuously build upon that as the industry use-cases and needs evolve:

  1. Spreadsheet modeling
  2. Forecasting
  3. Customer Segmentation
  4. Root cause Analysis
  5. Data Visualization and Dash-boarding
  6. Customer Lifetime value
  7. A/B testing
  8. Web Analytics


Is it too late to become a good Data Scientist?


If you’re looking for career change, that’s never too late!

If you’re looking to learn something new, that’s never too late!

If you’re looking to continue learning and go deeper in data science, that’s never too late!

If you don’t like Software engineering and want to switch to something else, that’s never too late!

But if you are after the “Data Science” gold rush, then you did miss the first wave! You are late.

But seriously, you should apply first-principles thinking to your career strategy and ideally not jump to whatever’s “hot” because by the time you get on that train, it’s usually too late.


As a data scientist, are you dissatisfied with your career? Why?


As a data scientist, I am not dissatisfied. I love what I do!

But I might have gotten lucky since I got into this for the right reasons. I was looking for a role that had a little bit of both tech & business and so few years back, Business Intelligence and Data Analysis seemed like a great place to start. So I did that for a while. Then industry evolved and the analytics maturity of the companies that I worked also evolved and so worked on building predictive models and became what they now call “Data scientist”.

It doesn’t mean that data science is the right role for everyone.

One of my friends feels that it’s not that “technical” and doesn’t like this role. He is more than happy with data engineer role where he gets to build stuff and dive deeper into technologies.

One of my other friends doesn’t like that you don’t own business/product outcomes and prefers a product manager role (even though he has worked as a data analyst for a while now and is working on transitioning away).

So, just based on the empirical data that I have, data science might not be an ideal path for everyone.

Hope that helps!


How do you become a good data analyst?


You can become a great data analyst by continuously improving the analytics maturity of the company/start-up that you work for:

If you create bunch of reports and help answer what happened— then try to help business users with why it happened. [Example: Instead of just sending website traffic info, add why the traffic spikes (up/downs) are happening]

If you are working on building bunch of models that answer why questions then try to help build predictive models next [Example: You have been working on a model that helped you answer why customers churned. Now built upon that and predict which customers will churn next]

If you do analytics and data science well and are already answering what, why, what’s next questions and you’re killing it! Then figure out how can you help business owners take action. Or make it easier than ever before to take actions on your data/recommendations.

Other answers for questions are directly/indirectly covered if you do this:

  1. You will have to pick the right tool for the job
  2. You will have to continuously keep learning (by taking online courses and/or you-tube)
  3. Don’t just be a data analyst, be a thought partner to business owners and if possible, transition into role that help you own business outcomes.

[Resource] 8 Methods to calculate CLV:


There are lot of ways to apply a CLV (customer lifetime value) model. But I hadn’t seen a single document that would summarize all of them — Until I saw this: http://srepho.github.io/CLV/CLV

If you are building a CLV model, one of first things that you might want to figure out is whether you have a contractual model or non-contractual model. And then figure out which methodology would work best for you. Here are 8 methods that were summarized in the link that I shared with you:

  • Naive
  • Recency Frequency Monetary (RFM) Summaries
  • Markov Chains
  • Hazard Functions
  • Survival Regression
  • Supervised Machine Learning using Random Forest


  • Management Heuristics
  • Distribution Based Approaches

How does rise of Power BI & Tableau affect SSRS?


It does affect SSRS adoption but SSRS (sql server reporting service) still has a place as long as there’s need for printer-friendly reporting and self-service vendors don’t have a good solution to meet this need.

Also, SSRS is great for automating operational reports that sends out emails with raw data (list of customers, products, sales transaction etc).

I advocate an analytics strategy where we think about satisfying data needs using “self-service”-first (Power BI, tableau, qlik) but if thats not the optimal solution (for cases like need to print it, I just need you to send me raw data in excel, etc) then I’ll mark it as SSRS project. And this architecture is supported by a central data model (aka operational data store, data mart, data warehouse) which makes it much easier to swap in/out any reporting tools that we need and we are not locked in by one vendor.

About 10–20% data requests that I see are SSRS projects and if the self-service platforms start adding features that compete with SSRS, I know I would start using those capabilities and phase out SSRS. But if that doesn’t happen, I will continue using SSRS 🙂


5 tests to validate the quality of your data:


Missing Data:

  • Descriptive statistics could be used to find missing data
  • Tools  like SQL/Excel/R can also be used to look for missing data
  • Some of the attributes of a field are missing: Like Postal Code in an address field


  • Check if all the values are standardized: Google, Google Inc & Alphabet might need to be standardized and categorized as Alphabet
  • Different Date formats used in the same field (MM/DD/YYYY and DD/MM/YYYY)


  • Total size of data (# of rows/columns): Sometimes you may not have all the rows that you were expecting (for e.g. 100k rows for each of your 100k customers) and if that’s not the case then that tells us that we don’t complete dataset at hand


  • Outlier: If someone;s age is 250 then that’s an outlier but also it’s an error somewhere in the data pipeline that needs to be fixed; outliers can be detected using creating quick data visualization
  • Data Type mismatch: If a text field is in a field where other entries are integer that’s also an error


  • Duplicates can be introduced in the data e.g. same rows duplicated in the dataset so that needs to be de-duplicated

