Data Engineering and Data Science Newsletter #5


The goal of this Insight Extractor’s newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following 5 articles made the cut for today’s newsletter.

1. Why Most Analytics efforts fail?

Fantastic post from Crystal Widjaja (ex Go-Jek SVP of BI) on why most analytics efforts fail? It walks you through steps by step process that you should follow to ensure that Analytics efforts in your org are successful. Must Read! If there’s one post that you read from this newsletter, then pick this one here

2. Data Engineer vs Data Scientist vs Machine Learning engineer

A good discussion on how do data scientists, data engineers, and machine learning engineers differ and where do they overlap. Youtube Video here

3. Three steps in Data Modeling

Learn about the 3 steps in data modeling (conceptual, logical, and physical) on Youtube here

4. Improving Product Recommendations:

Learn about the advances in product recommendations algorithm through Amazon’s science blog here

5. Top 5 SQL problems to solve

Good list on few problems that you should know how to solve for learning SQL. List here

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

Untitled (27).png

Data Engineering and Data Science Newsletter #4


The purpose of this Insight Extractor’s newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following articles made the cut for today’s newsletter.

1. What does a Business Intelligence Engineer (BIE) do in Amazon?

Have you wondered what Analytics professionals at Top tech companies work on? Are you job hunting and wondering what data roles (data engineer, data science, or Bi engineer) at Amazon are a great fit for your profile? If so, read Jamie Zhang’s (Sr Business Intelligence Engineer at Amazon) post here

2. What are the 2 Data & Analytics Maturity models that you should absolutely know about?

If you have read my blog, you know that I am a fan of mental models. So, here are 2 mental models (frameworks) shared by Greg Coquillo that are worth reading/digesting here

3. Using Machine Learning to Predict Value of Homes On Airbnb

Really good case study by Airbnb Data scientist Robert Chang here

4. How Netflix measures product succes?

Really good post on how to define metrics to prove or disprove your hypotheses and measure progress in a quick and simple manner. To do this, the author, Gibson Biddle, shares a mechanism of proxy metrics and it’s a really good approach. You can read the post here

Once you read the post above, also suggest learning about leading vs lagging indicators. It’s a similar approach and something that all data teams should strive to build for their customers.

5. Leading vs lagging indicators

Kieran Flanagan and Brian Balfour talk about why your north star metric should be a leading indicator and if it’s not then how to think about it. Read about it here

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

Data Maturity Mental Model Screenshot:

No alternative text description for this image



The purpose of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. Following articles made the cut for today’s newsletter:

1.What I love about Scrum for Data Sciene.

I love the Scrum mechanism for all data roles: data engineering, data analytics and data science. The author (Eugene) shares his perspective based on his experiences. I love that the below quote from the blog and you can read the full post here

Better to move in the right direction, albeit slower, than fast on the wrong path.


2. Building Analytics at 500px:

One of the best article on end to end anayltics journey at a startup by Samson Hu. Must read! Go here (Note that the analytics architectures have changed since this post which was published in 2015 but read through the mental model instead of exact tech tools that were mentioned in the post)

3. GO-FAST: The Data Behind Ramadan:

A great example of data storytelling from Go-Jek BI team lead Crysal Widjaja. Read here

4. Why Robinhood uses Airflow:

Airflow is a popular data engineering tool out there and this post provides really good context on it’s benefits and how it stacks up against other tools. Read here

5. Are dashboards dead?

Every new presentation layer format in the data field can lead to experts questioning the value of dashboards. With the rise of Jupyter notebooks, most vendors have now added the “notebooks” functionality and with that comes the follow-up question on if dashboards are dead? Here’s one such article. Read here

I am not still personally convinced that dashboards are “dead” but it should complement other presentation formats that are out there. The post does have good points against dashboards (e.g data is going portrait mode) and you should be aware about those to ensure that you are picking the right format for your customers. The author is also biased since they work for a data vendor that is betting big on notebooks and so you might want to account for that bias while reading this. Also, I had written about “Are dashboards dead?” in context of chat-bots in 2016 and that hypothesis turned out to be true; you can read that here

Image for post
Data is going portrait mode! Source

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

Insight Extractor’s Data Engineering and Science Newsletter #2


The purpose of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. Following articles made the cut for today’s newsletter:

  1. Amazing data storytelling example from Ben Evans. Ben starts from a basic premise around “Amazon is not profitable” that a lot of people argue about. He then goes on a data storytelling journey with publicly available data-sets around his chosen premise. Must read! here
  2. What kind of data scientist am I? Elena Greval from Airbnb wrote this excellent article in 2018 but it’s still relevant to understand 3 different flavors of data scientist. Read here
  3. What does it mean to be a data science leader or manager? Eric Weber’s short post on Linkedin on what does it mean to be a leader. IC’s should exhibit these traits for faster career growth especially if you are the sole data person in a decentralized structure. Read here
  4. Functional data engineering: In the blog post here, Maxime Beauchemin explains how to apply functional programming concepts to data engineering.
  5. Interested in growth analytics? Think about this interview question from Andrew Chen: How would you 10x the growth of Product X? LinkedIn post here

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

3 types of data scientist
3 Types of Data Scientist (Source)

Business Analytics Continuum: Descriptive, Diagnostic, Predictive, Prescriptive


Think of “continuum” as something you start and you never stop improving upon. In my mind, Business Analytics Continuum is continuous investment of resources to take business analytics capabilities to next level. So what are these levels? 

Here are the visual representation of the concept:

business analytics continuum

Insight Extractor Newsletter #1


I am kicking off a weekly newsletter to share curated list of things you should read to continue to get better at data. Links for this week below:

  1. AI Hierarchy of needs: Think of Artificial Intelligence as the top of a pyramid of needs. Yes, self-actualization (AI) is great, but you first need food, water, and shelter (data literacy, collection, and infrastructure). Read here
  2. A Beginner’s Guide to Data Engineering: A very good introduction to data engineering. If you work as a data analyst or data science, this is a must read to have a full understanding of an important discipline within data family. Also super useful for Jr. data engineers to explain what they do. Read here
  3. The Rise of the Data Engineer: Must read documents to restructure your thinking on data engineering. Read here
  4. Data engineer in 2020: This is a really good list of tools and skills that you should acquire if you want to become a data engineer. Read here
  5. Why did metric X go down last week?: A really good 2 minute read on Linkedin from Andrew Chen: Read here
Image for post

Source: Monica Rogati’s fantastic Medium post “The AI Hierarchy of Needs”

It’s your turn: Which article did you like the most? comment below!

Four Tenets for effective Metrics Design


The goal of this blog post is to provide four tenets for effective metrics design.

Four Tenets for effective Metrics Design

What is a tenet?

Tenet is a principle honored by a group of a people.

Why is effective metrics design important?

Metrics help with business decision-making. Picking the right metric increases the odds of decision making through data vs gut/intuition which can be a difference between success & failure.

Four Tenets for effective metrics design:

  1. We will prioritize quality over quantity of metrics: Prioritizing quality over quantity is important because if there are multiple metrics that teams are tracking then it’s hard for decision-makers to swarm on areas that are most important. Also having multiple metrics decreases the odds of each metric meeting the bar for quality. Now if you have few metrics that are well thought out and meets the other tenets that are listed in the post, it will increase the odds of having a solid data driven culture. I am not being prescriptive with what’s a good number of metrics you should have but you should definitely experiment and figure that out — however, I can give you a range: Anything less than 3 key metrics might be too less and more than 15 is a sign that need to trim down the list.
  2. We will design metrics that are behavior changing (aka actionable): A litmus test for this that ask your business decision-markers to articulate what they will do if the metric 1) goes up N% (let’s say 5%) 2) stays flat 3) goes down N% — they should have a clear answer for at least two out of three scenario’s above and if they can’t map a behavior change or action then this metric is not as important as you think. This is a sign that you can cut this metric from your “must-have” metrics list. This doesn’t mean that you don’t track it but it gives you a framework to prioritize other metrics over this or iterate your metric design till you can define this metric such that it is behavior changing.
  3. We will design metrics that are easy to understand: If your metrics are hard to understand then it’s harder to take actions from it and so it’s a pre-requisite for making your metrics that are behavior changing. Also, other than increasing your odd for the metrics being actionable, you are also making the metric appeal to a wider audience in your teams instead of just focusing on key business decision makers. Having a wide group of people understand your metrics is key to having a solid data driven culture.
  4. We will design metrics that are easy to compare: Metrics that are easy to compare across time-periods, customer segments & other business constructs help make it easy to understand and actionable. For e.g. If I tell you that we have 1000 paying customer last week and this week, that doesn’t give you enough signal whether it’s good or bad. But if I share that last week our conversion rate was 2.3% and this week our conversion rate is 2.1% then you know that something needs to be fixed on your conversion funnel given a 20 bps drop. Note that the ratios/rate are so easy to compare so one tactical tip that I have for you is that to make your metrics easy to compare, see if a ratio/rate makes sense in your case. Also, if your metrics are easy to compare then that increases the odds of it being behavior changing just like what i showed you through the example.


In this blog post, you learned about effective metric design.

What are your tips for picking good metrics? Would love to hear your thoughts!

Five Tenets for effective data visualization


Tenet is a principle honored by a group of a people. As a reader of this blog, you work with data and data visualization is an important element in your day-to-day work. So, to help you build effective data visualization, I created the tenets below which are simple to follow. This work is based on multiple sources and I’ll reference it below.

Five Tenets for effective data visualization:

  1. We will strive to understand customer needs
  2. We will tell the truth
  3. We will bias for simplicity
  4. We will pick the right chart
  5. We will select colors strategically

Examples for each tenet is listed below:

  • We will strive to understand customer needs

Defining and knowing your audience is very important before diving into the other tenets. Doing this will increase your probability of delivering an effective data visualization.

h/t to Mike Rubin for suggesting this over on LinkedIn here

  • We will tell the Truth

We won’t be dishonest with data. See an example below where Fox news deliberately started the bar chart y-axis at a non-zero number to make the delta look way higher than it actually is.

Source: Link

  • We will bias for Simplicity

3-D charts increase complexity for the end-users. So we won’t use something like this and instead opt for simplicity.

  • We will pick the right chart

I have linked some resources here

  • We will select colors strategically

Source here


In this post, I shared five tenets that will help you build effective data visualization.

Data Culture Mental Model.


What is Data Culture?

First, let’s define what is culture: “The set of shared values, goals, and practices that characterizes a group of people” Source

Now building on top of that for defining data culture, What are set of shared values? Decisions will be made based on insights generated through data. And also, group of people represent all decision makers in the organization. So in other words:

An org that has a great data culture will have a group of decision makers that uses data & insights to make decisions.

Why is building data culture important?

There are two ways to make decisions: one that uses data and one that doesn’t. My hypothesis is that decisions made through data are less wrong. To make this happen in your org, you need to have a plan. In the sections below, i’ll share key ingredients and mental model to build a data culture.

What are the ingredients for a successful data culture?

It’s 3 P’s: Platform, Process and People and continuously iterating and improving each of the P’s to improve data culture.

How to build data culture?

Here’s a mental model for a leader within an org:

  1. Understand data needs and prioritize
  2. Hire the right people
  3. Set team goals and define success
  4. Build something that people use
  5. Iterate on the data product and make it better
  6. Launch and communicate broadly
  7. Provide Training & Support
  8. Celebrate wins and communicate progress against goals
  9. Continue to build and identify next set of data needs

Disclaimer: The opinions are my own and don’t represent my employer’s view.

[Resource] Research on Data Use in Business


There aren’t many in-depth articles/research that talks about how senior business leaders use data in business and so when I stumbled on a three part series by Chris Dowsett (Head of Marketing analytics at Instagram), I had to share that with you.

Read here: