Insight Extractor’s Data Engineering and Science Newsletter #2


The purpose of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. Following articles made the cut for today’s newsletter:

  1. Amazing data storytelling example from Ben Evans. Ben starts from a basic premise around “Amazon is not profitable” that a lot of people argue about. He then goes on a data storytelling journey with publicly available data-sets around his chosen premise. Must read! here
  2. What kind of data scientist am I? Elena Greval from Airbnb wrote this excellent article in 2018 but it’s still relevant to understand 3 different flavors of data scientist. Read here
  3. What does it mean to be a data science leader or manager? Eric Weber’s short post on Linkedin on what does it mean to be a leader. IC’s should exhibit these traits for faster career growth especially if you are the sole data person in a decentralized structure. Read here
  4. Functional data engineering: In the blog post here, Maxime Beauchemin explains how to apply functional programming concepts to data engineering.
  5. Interested in growth analytics? Think about this interview question from Andrew Chen: How would you 10x the growth of Product X? LinkedIn post here

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

3 types of data scientist
3 Types of Data Scientist (Source)

Business Analytics Continuum: Descriptive, Diagnostic, Predictive, Prescriptive


Think of “continuum” as something you start and you never stop improving upon. In my mind, Business Analytics Continuum is continuous investment of resources to take business analytics capabilities to next level. So what are these levels? 

Here are the visual representation of the concept:

business analytics continuum

Insight Extractor Newsletter #1


I am kicking off a weekly newsletter to share curated list of things you should read to continue to get better at data. Links for this week below:

  1. AI Hierarchy of needs: Think of Artificial Intelligence as the top of a pyramid of needs. Yes, self-actualization (AI) is great, but you first need food, water, and shelter (data literacy, collection, and infrastructure). Read here
  2. A Beginner’s Guide to Data Engineering: A very good introduction to data engineering. If you work as a data analyst or data science, this is a must read to have a full understanding of an important discipline within data family. Also super useful for Jr. data engineers to explain what they do. Read here
  3. The Rise of the Data Engineer: Must read documents to restructure your thinking on data engineering. Read here
  4. Data engineer in 2020: This is a really good list of tools and skills that you should acquire if you want to become a data engineer. Read here
  5. Why did metric X go down last week?: A really good 2 minute read on Linkedin from Andrew Chen: Read here
Image for post

Source: Monica Rogati’s fantastic Medium post “The AI Hierarchy of Needs”

It’s your turn: Which article did you like the most? comment below!

Four Tenets for effective Metrics Design


The goal of this blog post is to provide four tenets for effective metrics design.

Four Tenets for effective Metrics Design

What is a tenet?

Tenet is a principle honored by a group of a people.

Why is effective metrics design important?

Metrics help with business decision-making. Picking the right metric increases the odds of decision making through data vs gut/intuition which can be a difference between success & failure.

Four Tenets for effective metrics design:

  1. We will prioritize quality over quantity of metrics: Prioritizing quality over quantity is important because if there are multiple metrics that teams are tracking then it’s hard for decision-makers to swarm on areas that are most important. Also having multiple metrics decreases the odds of each metric meeting the bar for quality. Now if you have few metrics that are well thought out and meets the other tenets that are listed in the post, it will increase the odds of having a solid data driven culture. I am not being prescriptive with what’s a good number of metrics you should have but you should definitely experiment and figure that out — however, I can give you a range: Anything less than 3 key metrics might be too less and more than 15 is a sign that need to trim down the list.
  2. We will design metrics that are behavior changing (aka actionable): A litmus test for this that ask your business decision-markers to articulate what they will do if the metric 1) goes up N% (let’s say 5%) 2) stays flat 3) goes down N% — they should have a clear answer for at least two out of three scenario’s above and if they can’t map a behavior change or action then this metric is not as important as you think. This is a sign that you can cut this metric from your “must-have” metrics list. This doesn’t mean that you don’t track it but it gives you a framework to prioritize other metrics over this or iterate your metric design till you can define this metric such that it is behavior changing.
  3. We will design metrics that are easy to understand: If your metrics are hard to understand then it’s harder to take actions from it and so it’s a pre-requisite for making your metrics that are behavior changing. Also, other than increasing your odd for the metrics being actionable, you are also making the metric appeal to a wider audience in your teams instead of just focusing on key business decision makers. Having a wide group of people understand your metrics is key to having a solid data driven culture.
  4. We will design metrics that are easy to compare: Metrics that are easy to compare across time-periods, customer segments & other business constructs help make it easy to understand and actionable. For e.g. If I tell you that we have 1000 paying customer last week and this week, that doesn’t give you enough signal whether it’s good or bad. But if I share that last week our conversion rate was 2.3% and this week our conversion rate is 2.1% then you know that something needs to be fixed on your conversion funnel given a 20 bps drop. Note that the ratios/rate are so easy to compare so one tactical tip that I have for you is that to make your metrics easy to compare, see if a ratio/rate makes sense in your case. Also, if your metrics are easy to compare then that increases the odds of it being behavior changing just like what i showed you through the example.


In this blog post, you learned about effective metric design.

What are your tips for picking good metrics? Would love to hear your thoughts!

Five Tenets for effective data visualization


Tenet is a principle honored by a group of a people. As a reader of this blog, you work with data and data visualization is an important element in your day-to-day work. So, to help you build effective data visualization, I created the tenets below which are simple to follow. This work is based on multiple sources and I’ll reference it below.

Five Tenets for effective data visualization:

  1. We will strive to understand customer needs
  2. We will tell the truth
  3. We will bias for simplicity
  4. We will pick the right chart
  5. We will select colors strategically

Examples for each tenet is listed below:

  • We will strive to understand customer needs

Defining and knowing your audience is very important before diving into the other tenets. Doing this will increase your probability of delivering an effective data visualization.

h/t to Mike Rubin for suggesting this over on LinkedIn here

  • We will tell the Truth

We won’t be dishonest with data. See an example below where Fox news deliberately started the bar chart y-axis at a non-zero number to make the delta look way higher than it actually is.

Source: Link

  • We will bias for Simplicity

3-D charts increase complexity for the end-users. So we won’t use something like this and instead opt for simplicity.

  • We will pick the right chart

I have linked some resources here

  • We will select colors strategically

Source here


In this post, I shared five tenets that will help you build effective data visualization.

Data Culture Mental Model.


What is Data Culture?

First, let’s define what is culture: “The set of shared values, goals, and practices that characterizes a group of people” Source

Now building on top of that for defining data culture, What are set of shared values? Decisions will be made based on insights generated through data. And also, group of people represent all decision makers in the organization. So in other words:

An org that has a great data culture will have a group of decision makers that uses data & insights to make decisions.

Why is building data culture important?

There are two ways to make decisions: one that uses data and one that doesn’t. My hypothesis is that decisions made through data are less wrong. To make this happen in your org, you need to have a plan. In the sections below, i’ll share key ingredients and mental model to build a data culture.

What are the ingredients for a successful data culture?

It’s 3 P’s: Platform, Process and People and continuously iterating and improving each of the P’s to improve data culture.

How to build data culture?

Here’s a mental model for a leader within an org:

  1. Understand data needs and prioritize
  2. Hire the right people
  3. Set team goals and define success
  4. Build something that people use
  5. Iterate on the data product and make it better
  6. Launch and communicate broadly
  7. Provide Training & Support
  8. Celebrate wins and communicate progress against goals
  9. Continue to build and identify next set of data needs

Disclaimer: The opinions are my own and don’t represent my employer’s view.

[Resource] Research on Data Use in Business


There aren’t many in-depth articles/research that talks about how senior business leaders use data in business and so when I stumbled on a three part series by Chris Dowsett (Head of Marketing analytics at Instagram), I had to share that with you.

Read here:

5 stages of Analytical Competition


I love mental model and frameworks. I have shared some frameworks on this blog already like 3 W’s (What, Why, what’s next) and 3 P’s (Platform, People, Process) focused on helping analytics leader figure what their analytics roadmap should be. I was reading ‘competing on analytics’ book and came across the 5 stages of Analytical competition which seemed like another framework worth sharing.

The two end of the spectrum are org is flying blind to org is competing through analytics. Stages are:

  1. Analytically impaired
  2. Localized Analytics
  3. Analytical aspirations
  4. Analytical companies
  5. Analytical competitor

You can read about each one of these here: Five Stages of Analytic Competition  and you can read a synopsis of the book here.

How Analytics changed Scouting in Soccer


An interesting video that’s a great reminder on how Analytics is a game-changer when applied correctly. The video shared above how small clubs uses analytics to compete with big clubs and continue to not only stay relevant but grow in the process.

Similar analogy can be drawn for startups (or early-mid stage products inside big companies) where they can use Analytics to compete with incumbents in the market.

Let me know what you think. What’s your favorite analogy to help explain why analytics is useful to your org?

[Career Advice] What are the downsides of working as a data scientist in Silicon Valley?


There are unique challenges to tech roles in Silicon Valley like housing costs & commute times but enough opportunity that it can make up for it if you prepare well. But this isn’t unique to data scientists so parking common challenges aside, here’s what I think is downside of working as a data scientist in silicon valley.

You see, every company follows the curve to reach an Analytics Maturity where after which a data scientist can start adding enormous value. I call it 3W curve.

What -> Why -> What’s Next.

What stage is a company in early analytics maturity stage where they are answering what questions. E.g. what are my sales for 2018? Here a Business Intelligence and Data engineer could help.

Why stage is a company in mid analytics stage where they are asking why questions. E.g. why did our sales go up in Q3 of 2018 compared to Q2 of 2018? Here a business analyst or product analyst can help.

After these two stages, company reaches the third stage where they ask what’s next questions. E.g. What is going to my top product growth area for next quarter? This is something that a data scientist could help with.

Now, having said that, Silicon valley has a lot of companies that are in early to mid stages and are better suited for Data engineers, Business intelligence engineers and Business/Product Analysts but they end up recruiting for “Data Scientists” (since it’s the sexiest term for all things data these days!) — this creates a mismatch in reality and expectation. The data scientist is expected to work on “advanced” analytics topics for a company where the culture and tooling is “basic”. This is a recipe for failure.

This delta is expectation vs reality is the biggest downside of working as a data scientist in silicon valley. To bridge this gap, hiring managers need to think through what their needs are and hire according to the needs (instead of hype) and the candidates should ask probing questions during interview process to judge the analytics maturity of the company to make sure it’s a great fit for them.

Also, I am not saying this delta doesn’t exist in other cities, it’s just that during my time in silicon valley, I noticed it more than I did in other cities. Silicon valley is a leader in tech so if this is fixed here then I expect other cities to follow the path.

originally answered on quora: