All things Data Newsletter #15 (#dataengineering #datascience #data #analytics)

Standard

(if this newsletter was forwarded to you then you can subscribe here: https://insightextractor.com/)

The goal of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following 5 articles made the cut for today’s newsletter.

(1) Scaling data

Fantastic article by Crystal Widjaja on scaling data. It shares a really good framework for building analytics maturity and how to think about building capabilities to navigate each stage. Must read! Here

three stages.png
Image Source: reforge

(2) Building startup’s data infrastructure in 1-Hour

Good video that touches multiple tools. Watch here: https://www.youtube.com/watch?v=WOSrRTaNIm0 (it’s a little outdated since it was shared in 2019 which is 2 years ago but the architecture is still helpful)

(3) Analytics lesson learned

If you haven’t read lean analytics, I recommed it! After that, you should read this free companion which covers 12 good analytics case studies. Read here

(4) Organizing data teams

How do you organize data teams? completely centralized under a data leader? or do you structure it de-centralized reporting into leaders of business functions? some good thoughts here

Image Source

(5) Metrics layer is a missing piece in modern data stack

This is a good article that encourages you to think about adding metrics layer in your data stack. In the last newseltter, I also shared an Article that talks about Airbbn’s Minerva metrics layer and this article does a good job of providing additional reasons to build something simiar. Read here

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

All things Data Newsletter #14 (#dataengineering #datascience #data #analytics)

Standard

(if this newsletter was forwarded to you then you can subscribe here: https://insightextractor.com/)

The goal of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following 5 articles made the cut for today’s newsletter.

(1) Analytics is a mess

Fantastic article highligting the importance of the “creative” side of analytics. It’s not always structured and that is also what makes it fun. Read here

(2) Achieving metric consistency at Scale — Airbnb Data

This is a great case study shared by Airbnb’s data team on how they achived metrics consistency at Scale. Read here

Image Source

(3) Achieving metric consistency & standardization — Uber Data

Another great read on metrics standardization — this time at Uber. As you can notice it’s a recurring problem at different organizations after hitting a certain growth threshold. This problem occurs since in the intial growth stage, there’s a lot of focus on enabling folks to look at metrics in a manner that’s optimized for speed. After a certain stage, this needs to balanced with consistency where the teams might have gone in different direction and they are defining the same thing in different way but that doesn’t scale anymore since you need some consistency and standardization. This is where the topic of metric consistency and standardization comes in. It’s a problem worth solving — and if you are interested, please read this article here

(4) Where is data engineering going in 5 years?

A good short post by Zach Wilson on LinkedIn talking about where data engineering is going over the next few years. Not surprised to see Data privacy in there! Read others here

(5) 3 foundations of successful data science team

An Amazon leader (Elvis Dieguez) talks about the 3 foundational pillards of a successful data team. This is comprised of 1) data warehouse 2) automated data pipelines 3) self-service analytics tool. Read here

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

All things data newsletter #13 (#dataengineer #datascience)

Standard

(if this newsletter was forwarded to you then you can subscribe here: https://insightextractor.com/)

The goal of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following 5 articles made the cut for today’s newsletter.

(1) The Modern Data Stack

Amazing artcile by Tristan Handy explaining modern data stack. If you are familiar with tools such as Looker, Redshift, Snowflake, BigQuery, FiveTran, DBT, etc but wondered how each of them fit into an overall architecture, then this is a must read! Read here

Image Source: GetDBT by Tristan Handy

(2) How can data engineering teams be productive?

Good mental model and tips to build a productive data engineering team. Read here

(3) Why is future of Business Intelligence open source?

From the founder of Apache superset on why he beleives that the future of BI is open source? Read here.

(This is also a great marketing pitch for Apache Superset so please read this with a grain of salt and be aware about author’s bias on this topic)

(4) How Data and Design can work better together?

Diagnose with data and Treat with Design. Great artcile by the Julie Zhuo here

(5) Zach wilson believes that standups can be less productive in data engineering teams compared to software engineering teams

Interesting observations on his LinkedIn thread here

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

Framework for onboarding as an Analytics leader in a new team:

Standard
1. Start with people:

On a new team, start with meeting people. This includes your team, stakeholders and cross-functional partners. Ask them about the company, product, team, help they need and seek advice. Understand the career growth plans for every member of your team.

2. Understand product/company:

Read docs. Ask questions (lots of them). Attend cross-functional meetings. Try out the product yourself. Dig deeper to understand goals and success metrics of the products and company. Recommend creating an shared live doc where you invite other folks to add their comments & suggestions.

3. Build out team vision and roadmap:

Document customer pain points. Map that against the projects that your team is executing. Learn about the top successes and misses. Articulate team vision. Build a roadmap. Iterate with partners and get alignment with leadership.

4. Focus on Impact:

Identify projects in the first 90 days that will deliver impact early. Stay focused on long term vision and impact. Keep learning. Get alignment with the leadership on how success will be measured. Roll up your sleeves and start delivering what the team & customers needs most.

#analytics#leadership#data#team

Originally posted on LinkedIn here.

All things data newsletter #12 (#dataengineer #datascience)

Standard

(if this newsletter was forwarded to you then you can subscribe here: https://insightextractor.com/)

The goal of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following 5 articles made the cut for today’s newsletter.

Why dropbox picked Apache superset as data exploration tool?

Apache superset is gaining momentum and if you want to understand the reasons behind that, you can start by reading this article here

Growth: Adjacent User Theory

I love the framing via this LinkedIn post here where Nimit Jain says that Great Growth PM output looks like “We discovered 2 new user segments who are struggling to proceed at 2 key steps in the funnel and simplified the product for them via A/B experiments. This lead to conversion improvement of 5-10% at these steps so far. We are now working to figure the next segment of users to focus on.”; you can read about the Adjacent user theory here

SQL window functions

Need intro to SQL window functions? Read this

Luigi vs Airflow

Really good matrix on comparing 2 popular ETL workflow platforms. Read here

A data engineer’s point of view on data democratization

If more people can easily access data that was previously not accessible to them then that’s a good thing. This is a good read on various things to consider, read here

Apache Superset growth within Dropbox:

superset adoption data graphics
Image Source: Dropbox Tech Blog

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

All things data newsletter #11 (#dataengineer, #datascience)

Standard

(if this newsletter was forwarded to you then you can subscribe here: https://insightextractor.com/)

The goal of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following 5 articles made the cut for today’s newsletter.

1. AWS re:Invent ML, Data and Analytics announcements

Really good recap of all ML, Data and Analytics announcements at AWS reinvent 2020 here

2. How to build production workflow with SQL modeling

A really good example of how a data engineering at Shopify applied software engineering best practices to analytics code. Read here

Image Source

3. Back to basics: What are different data pipeline components and types?

Must know basic concepts for every data engineer here

4. Back to basics: SQL window functions

I was interviewing a senior candidate earlier this week and it was unfortunate to basic mistakes while writing SQL window functions. Don’t let that happen to you. Good tutorial here

5. 300+ data science interview questions

Good library of data science interview questions and answers

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

Five Tenets for effective data visualization

Standard

Tenet is a principle honored by a group of a people. As a reader of this blog, you work with data and data visualization is an important element in your day-to-day work. So, to help you build effective data visualization, I created the tenets below which are simple to follow. This work is based on multiple sources and I’ll reference it below.

Five Tenets for effective data visualization:

  1. We will strive to understand customer needs
  2. We will tell the truth
  3. We will bias for simplicity
  4. We will pick the right chart
  5. We will select colors strategically

Examples for each tenet is listed below:

  • We will strive to understand customer needs

Defining and knowing your audience is very important before diving into the other tenets. Doing this will increase your probability of delivering an effective data visualization.

h/t to Mike Rubin for suggesting this over on LinkedIn here

  • We will tell the Truth

We won’t be dishonest with data. See an example below where Fox news deliberately started the bar chart y-axis at a non-zero number to make the delta look way higher than it actually is.

Source: Link

  • We will bias for Simplicity

3-D charts increase complexity for the end-users. So we won’t use something like this and instead opt for simplicity.

  • We will pick the right chart

I have linked some resources here

  • We will select colors strategically

Source here

Conclusion:

In this post, I shared five tenets that will help you build effective data visualization.

Data Culture Mental Model.

Standard

What is Data Culture?

First, let’s define what is culture: “The set of shared values, goals, and practices that characterizes a group of people” Source

Now building on top of that for defining data culture, What are set of shared values? Decisions will be made based on insights generated through data. And also, group of people represent all decision makers in the organization. So in other words:

An org that has a great data culture will have a group of decision makers that uses data & insights to make decisions.

Why is building data culture important?

There are two ways to make decisions: one that uses data and one that doesn’t. My hypothesis is that decisions made through data are less wrong. To make this happen in your org, you need to have a plan. In the sections below, i’ll share key ingredients and mental model to build a data culture.

What are the ingredients for a successful data culture?

It’s 3 P’s: Platform, Process and People and continuously iterating and improving each of the P’s to improve data culture.

How to build data culture?

Here’s a mental model for a leader within an org:

  1. Understand data needs and prioritize
  2. Hire the right people
  3. Set team goals and define success
  4. Build something that people use
  5. Iterate on the data product and make it better
  6. Launch and communicate broadly
  7. Provide Training & Support
  8. Celebrate wins and communicate progress against goals
  9. Continue to build and identify next set of data needs

Disclaimer: The opinions are my own and don’t represent my employer’s view.

Two great posts on DAU/MAU and Measuring Power Users

Standard

Two great posts from Andrew Chen. Links below:

These posts were perfectly timed for me as we started thinking about Annual Planning for Alexa Voice Shopping org (Amazon) this week. As a part of my research of which metrics to use to measure things that our business cares most about and then setting the right benchmarks/goals for the org, the posts below were super helpful. So if you are in tech and if you care about 1) measuring frequency of usage 2) measuring the most engaged cohort then you should take some time to read these posts.

Power user curve 

DAU/MAU is an important metric to measure engagement, but here’s where it fails

Cheers!

Excel 2013: Display hidden rows columns and data on an Excel Chart

Standard

If you hide rows, columns and data on excel, the chart that’s uses this data also hides it — while this is the default behavior, you can override this by following the steps below.

Let’s reproduce the behavior first.

I have a simple excel chart like shown below:

Excel Chart Hide Data 1Now, if I hide the data that is selected for this chart then the chart stops showing this as well:

Excel Chart Hide Data 2To fix this and if you want the cells (rows, columns and data) to be still hidden but still have the chart show up, then follow the steps:

  1. Select the chart
  2. Under Chart Tools > Design > Select Data Excel Chart Hide Data 3
  3. Click on Hidden and Empty Cells Excel Chart Hide Data 4
  4. Check the Show data in hidden rows and columns check-boxExcel Chart Hide Data 5
  5. Go back to excel and you should see the data on the chart now even though the data is hidden Excel Chart Hide Data 6

Hope that helps!