All things Data Newsletter #14 (#dataengineering #datascience #data #analytics)

Standard

(if this newsletter was forwarded to you then you can subscribe here: https://insightextractor.com/)

The goal of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following 5 articles made the cut for today’s newsletter.

(1) Analytics is a mess

Fantastic article highligting the importance of the “creative” side of analytics. It’s not always structured and that is also what makes it fun. Read here

(2) Achieving metric consistency at Scale — Airbnb Data

This is a great case study shared by Airbnb’s data team on how they achived metrics consistency at Scale. Read here

Image Source

(3) Achieving metric consistency & standardization — Uber Data

Another great read on metrics standardization — this time at Uber. As you can notice it’s a recurring problem at different organizations after hitting a certain growth threshold. This problem occurs since in the intial growth stage, there’s a lot of focus on enabling folks to look at metrics in a manner that’s optimized for speed. After a certain stage, this needs to balanced with consistency where the teams might have gone in different direction and they are defining the same thing in different way but that doesn’t scale anymore since you need some consistency and standardization. This is where the topic of metric consistency and standardization comes in. It’s a problem worth solving — and if you are interested, please read this article here

(4) Where is data engineering going in 5 years?

A good short post by Zach Wilson on LinkedIn talking about where data engineering is going over the next few years. Not surprised to see Data privacy in there! Read others here

(5) 3 foundations of successful data science team

An Amazon leader (Elvis Dieguez) talks about the 3 foundational pillards of a successful data team. This is comprised of 1) data warehouse 2) automated data pipelines 3) self-service analytics tool. Read here

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

All things data newsletter #11 (#dataengineer, #datascience)

Standard

(if this newsletter was forwarded to you then you can subscribe here: https://insightextractor.com/)

The goal of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following 5 articles made the cut for today’s newsletter.

1. AWS re:Invent ML, Data and Analytics announcements

Really good recap of all ML, Data and Analytics announcements at AWS reinvent 2020 here

2. How to build production workflow with SQL modeling

A really good example of how a data engineering at Shopify applied software engineering best practices to analytics code. Read here

Image Source

3. Back to basics: What are different data pipeline components and types?

Must know basic concepts for every data engineer here

4. Back to basics: SQL window functions

I was interviewing a senior candidate earlier this week and it was unfortunate to basic mistakes while writing SQL window functions. Don’t let that happen to you. Good tutorial here

5. 300+ data science interview questions

Good library of data science interview questions and answers

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

All things data newsletter #10 (#dataengineer #datascience)

Standard

(if this newsletter was forwarded to you then you can subscribe here: https://insightextractor.com/)

The goal of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following 5 articles made the cut for today’s newsletter.

1. Architecture for Telemetry data

A good reminder that the software development architecture can be significantly simplified for capturing telemetry data here

2. 5 popular job titles for data engineers

This post here lists 5 popular job titles: data engineer, data architect, data warehouse engineer — I think Analytics engineer is missing in that list but a good post nonetheless. I hope that we get some consolidation and standardization of these job titles over the next few cycles.

3. [Podcast] startup growth strategy and building Gojek data team – Crystal Widjaja

Really good podcast, highly recommended! here

4. Tenets for data cleaning

A must-read technical whitepaper from legendary Hadley Wickham. These principles form the foundation on top of which R software gained a lot of momentum for adoption. Python community uses similar tenets. Must read! here and here

5. Magic metrics that startup probably as product/market fit from Andrew Chen

A must-follow Growth leader!

  1. Cohort Retention curves flatten (stickiness)
  2. Actives/Reg > 25% (validates TAM)
  3. power user curve showing a smile

TelemetryTiers
Image Source

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

All things Data Engineering & Data Science Newsletter #8

Standard

(if this newsletter was forwarded to you then you can subscribe here: https://insightextractor.com/)

The goal of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following 5 articles made the cut for today’s newsletter.

What is a data lake?

Good article on basics of data lake architecture on guru99 here

Data quality at Airbnb

Really good framework on how to think about data quality systematically through examples and mental-model from Airbnb here

Monetization vs growth is a false choice

Good article from Reforge for Monetization vs growth mental model here

Performance Tuning SQL queries

Really good basic post on tuning SQL queries here

Improving conversion rates through A/B testing

Good mental model to run effective A/B testing to improve metrics such as conversion rate here

Source: Difference Media Variations for A/B testing

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

All things data engineering & science newsletter #7

Standard

(if this newsletter was forwarded to you then you can subscribe here: https://insightextractor.com/)

The goal of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following 5 articles made the cut for today’s newsletter.

1. Why a data scientist is not a data engineer?

Good post on the difference between data engineer and data scientist and why you need both roles in a data team. I chuckled when one of the sections had explanations around why data engineering != spark since I completely agree that these roles should be boxed around just one or two tools! read the full post here

2. Correlation vs Causation:

1 picture = 1000 words!

No alternative text description for this image
Image Source
3. Best Practices from Facebook’s growth team:

Read Chamath Palihapitiya and Andy John’s response to this Quora question here

4. Simple mental model for handling for handling “big data” workloads
No alternative text description for this image
Image Source
5. Five things to do as a data scientist in firt 90 days that will have big impact.

Eric Weber gives 5 tips on what to do as a new data scientist to have a big impact. Read here

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

Data Engineering and Data Science Newsletter #6

Standard

The goal of this Insight Extractor’s newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following 5 articles made the cut for today’s newsletter.

1. How do you measure Word of mouth for growth analytics?

Some really good research and methodologies on how to measure word of the growth analytics? Read here

womLoop.png
Source
2. Lean data science

really good insights like “measure business performance and not model performance” with the end goal of delivering business value instead of focusing too much on the algorithm. Read here

3. Good data storytelling: Emoji use in the new normal

Read this to get inspired about to tell stories through data, really well done! Go here

5-Top-Ten-Emojis-Used-On-Twitter-2
Source
4. Why is Data engineering important?

Good post that explains important of data engineering here

Source
5. Five things you should know about Data engineering career

This is a good post to read along with reading about the importance of Data engineers above. Both of these articles give you a good mental model to explain the role and assess if this the right fit for you if you are considering this career track. Read here

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

Data Engineering and Data Science Newsletter #4

Standard

The purpose of this Insight Extractor’s newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following articles made the cut for today’s newsletter.

1. What does a Business Intelligence Engineer (BIE) do in Amazon?

Have you wondered what Analytics professionals at Top tech companies work on? Are you job hunting and wondering what data roles (data engineer, data science, or Bi engineer) at Amazon are a great fit for your profile? If so, read Jamie Zhang’s (Sr Business Intelligence Engineer at Amazon) post here

2. What are the 2 Data & Analytics Maturity models that you should absolutely know about?

If you have read my blog, you know that I am a fan of mental models. So, here are 2 mental models (frameworks) shared by Greg Coquillo that are worth reading/digesting here

3. Using Machine Learning to Predict Value of Homes On Airbnb

Really good case study by Airbnb Data scientist Robert Chang here

4. How Netflix measures product succes?

Really good post on how to define metrics to prove or disprove your hypotheses and measure progress in a quick and simple manner. To do this, the author, Gibson Biddle, shares a mechanism of proxy metrics and it’s a really good approach. You can read the post here

Once you read the post above, also suggest learning about leading vs lagging indicators. It’s a similar approach and something that all data teams should strive to build for their customers.

5. Leading vs lagging indicators

Kieran Flanagan and Brian Balfour talk about why your north star metric should be a leading indicator and if it’s not then how to think about it. Read about it here

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

Data Maturity Mental Model Screenshot:

No alternative text description for this image
Source

INSIGHT EXTRACTOR’S DATA ENGINEERING AND SCIENCE NEWSLETTER #3

Standard

The purpose of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. Following articles made the cut for today’s newsletter:

1.What I love about Scrum for Data Sciene.

I love the Scrum mechanism for all data roles: data engineering, data analytics and data science. The author (Eugene) shares his perspective based on his experiences. I love that the below quote from the blog and you can read the full post here

Better to move in the right direction, albeit slower, than fast on the wrong path.

Source

2. Building Analytics at 500px:

One of the best article on end to end anayltics journey at a startup by Samson Hu. Must read! Go here (Note that the analytics architectures have changed since this post which was published in 2015 but read through the mental model instead of exact tech tools that were mentioned in the post)

3. GO-FAST: The Data Behind Ramadan:

A great example of data storytelling from Go-Jek BI team lead Crysal Widjaja. Read here

4. Why Robinhood uses Airflow:

Airflow is a popular data engineering tool out there and this post provides really good context on it’s benefits and how it stacks up against other tools. Read here

5. Are dashboards dead?

Every new presentation layer format in the data field can lead to experts questioning the value of dashboards. With the rise of Jupyter notebooks, most vendors have now added the “notebooks” functionality and with that comes the follow-up question on if dashboards are dead? Here’s one such article. Read here

I am not still personally convinced that dashboards are “dead” but it should complement other presentation formats that are out there. The post does have good points against dashboards (e.g data is going portrait mode) and you should be aware about those to ensure that you are picking the right format for your customers. The author is also biased since they work for a data vendor that is betting big on notebooks and so you might want to account for that bias while reading this. Also, I had written about “Are dashboards dead?” in context of chat-bots in 2016 and that hypothesis turned out to be true; you can read that here

Image for post
Data is going portrait mode! Source

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

Four Tenets for effective Metrics Design

Standard

The goal of this blog post is to provide four tenets for effective metrics design.

Four Tenets for effective Metrics Design

What is a tenet?

Tenet is a principle honored by a group of a people.

Why is effective metrics design important?

Metrics help with business decision-making. Picking the right metric increases the odds of decision making through data vs gut/intuition which can be a difference between success & failure.

Four Tenets for effective metrics design:

  1. We will prioritize quality over quantity of metrics: Prioritizing quality over quantity is important because if there are multiple metrics that teams are tracking then it’s hard for decision-makers to swarm on areas that are most important. Also having multiple metrics decreases the odds of each metric meeting the bar for quality. Now if you have few metrics that are well thought out and meets the other tenets that are listed in the post, it will increase the odds of having a solid data driven culture. I am not being prescriptive with what’s a good number of metrics you should have but you should definitely experiment and figure that out — however, I can give you a range: Anything less than 3 key metrics might be too less and more than 15 is a sign that need to trim down the list.
  2. We will design metrics that are behavior changing (aka actionable): A litmus test for this that ask your business decision-markers to articulate what they will do if the metric 1) goes up N% (let’s say 5%) 2) stays flat 3) goes down N% — they should have a clear answer for at least two out of three scenario’s above and if they can’t map a behavior change or action then this metric is not as important as you think. This is a sign that you can cut this metric from your “must-have” metrics list. This doesn’t mean that you don’t track it but it gives you a framework to prioritize other metrics over this or iterate your metric design till you can define this metric such that it is behavior changing.
  3. We will design metrics that are easy to understand: If your metrics are hard to understand then it’s harder to take actions from it and so it’s a pre-requisite for making your metrics that are behavior changing. Also, other than increasing your odd for the metrics being actionable, you are also making the metric appeal to a wider audience in your teams instead of just focusing on key business decision makers. Having a wide group of people understand your metrics is key to having a solid data driven culture.
  4. We will design metrics that are easy to compare: Metrics that are easy to compare across time-periods, customer segments & other business constructs help make it easy to understand and actionable. For e.g. If I tell you that we have 1000 paying customer last week and this week, that doesn’t give you enough signal whether it’s good or bad. But if I share that last week our conversion rate was 2.3% and this week our conversion rate is 2.1% then you know that something needs to be fixed on your conversion funnel given a 20 bps drop. Note that the ratios/rate are so easy to compare so one tactical tip that I have for you is that to make your metrics easy to compare, see if a ratio/rate makes sense in your case. Also, if your metrics are easy to compare then that increases the odds of it being behavior changing just like what i showed you through the example.

Conclusion:

In this blog post, you learned about effective metric design.

What are your tips for picking good metrics? Would love to hear your thoughts!

Data Culture Mental Model.

Standard

What is Data Culture?

First, let’s define what is culture: “The set of shared values, goals, and practices that characterizes a group of people” Source

Now building on top of that for defining data culture, What are set of shared values? Decisions will be made based on insights generated through data. And also, group of people represent all decision makers in the organization. So in other words:

An org that has a great data culture will have a group of decision makers that uses data & insights to make decisions.

Why is building data culture important?

There are two ways to make decisions: one that uses data and one that doesn’t. My hypothesis is that decisions made through data are less wrong. To make this happen in your org, you need to have a plan. In the sections below, i’ll share key ingredients and mental model to build a data culture.

What are the ingredients for a successful data culture?

It’s 3 P’s: Platform, Process and People and continuously iterating and improving each of the P’s to improve data culture.

How to build data culture?

Here’s a mental model for a leader within an org:

  1. Understand data needs and prioritize
  2. Hire the right people
  3. Set team goals and define success
  4. Build something that people use
  5. Iterate on the data product and make it better
  6. Launch and communicate broadly
  7. Provide Training & Support
  8. Celebrate wins and communicate progress against goals
  9. Continue to build and identify next set of data needs

Disclaimer: The opinions are my own and don’t represent my employer’s view.