ALL THINGS DATA NEWSLETTER #17

Standard

(if this newsletter was forwarded to you then you can subscribe here: https://insightextractor.com/)

This newsletter aims to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll share articles from various sources I found interesting. The following 5 articles/videos made the cut for today’s newsletter.

1. Data Contracts 101 by Aurimas Griciūnas

The concept of a Data Contract is an agreement between Data Producers and Data Consumers on the schema, SLAs, semantics, lineage, and other details of the data being produced. Data Contracts should be enforced to ensure data quality, prevent unexpected outages, enforce ownership of data, improve scalability, and reduce the intermediate data handover layer. An example implementation of Data Contract Enforcement involves schema changes in a git repository, data validation against schemas in the Schema Registry, pushing validated data to a Validated Data Topic, validating data against additional SLAs, and alerting Producers and Consumers to any SLA breaches. Read more here

2. A brief history of Data at Coinbase by Michael Li

The article provides a brief history of data and its importance in the development of Coinbase, a cryptocurrency exchange platform. The author explains how the concept of data has evolved over time and how Coinbase has utilized data-driven decision-making to improve its platform and expand its user base. The article also discusses the potential of Web3, a new decentralized web infrastructure, and how it can revolutionize the way data is stored, shared, and used. The author concludes by emphasizing the importance of data in the growth of Coinbase and the potential of Web3 to transform the future of data.
Read more here

3. How to use the Snowflake Query Profile by Ian Whitestone

The article explains how to use the Snowflake Query Profile, a feature of the Snowflake cloud data platform, to diagnose and optimize SQL queries. The Query Profile provides detailed information about the query execution plan, including the amount of time spent on each operation, the number of rows processed, and the resources consumed. The article walks through the steps of running a query and analyzing the Query Profile to identify potential bottlenecks or areas for improvement. The author also provides tips for optimizing queries based on the information provided by the Query Profile. Overall, the article offers a useful guide for developers and data analysts looking to improve the performance of their Snowflake SQL queries. Read more here

4. Building Modern Data Teams by Pedram Navid:

The article discusses the characteristics of modern data teams and the key roles involved in building and managing data infrastructure. The author argues that data teams should be cross-functional, collaborative, and focused on delivering business value through data insights. The article identifies several key roles in modern data teams, including data engineers, data analysts, data scientists, and product managers. The author provides an overview of the responsibilities and skills required for each role and emphasizes the importance of communication and collaboration between team members. The article also highlights some of the challenges faced by data teams, such as data quality and security, and provides tips for overcoming these challenges. Overall, the article provides a useful perspective on the structure and function of modern data teams. Read more here

5. How to Prioritize Analytical Work by Elvis Dieguez

The article provides tips on how to prioritize analytical work effectively. The author suggests that prioritization should be based on the business impact of the analytical work, as well as it’s level of complexity and the urgency of the request. The article recommends creating a prioritization matrix that takes these factors into account and prioritizing work based on its placement in the matrix. The author also emphasizes the importance of communication and collaboration with stakeholders to ensure that priorities are aligned with business needs. Additionally, the article provides some tips for managing a backlog of analytical work and for tracking progress and results. Overall, the article offers practical advice for data analysts and other professionals responsible for managing analytical workloads. Read more here

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

3 prioritization frameworks

Standard

Time and energy are finite resources and it’s important to use them effectively and efficiently. This requires having a good prioritization framework. In this post, I’ll share 3 frameworks that I have frequently used to prioritize.

1 Eisenhower Matrix. Urgent vs Important matrix

Source

2 Cost Benefit Matrix.

A similar 2×2 matrix that is equally relevant and helps you identify if you delete (low benefit, high cost), defer (low benefit, low priority), plan (high priority, high cost) and do (high benefit, low cost).

3 Scoring Models or Weighted prioritization matrix:

If you want to make a decision to prioritize, you can list all factors and their weights to come back with a List. E.g. Choosing a restaurant:

Depending on the focus, you might also have a specific scoring model or framework. For e.g. In Product, RICE framework is pretty common that scores based on Reach, Impact, Confidence and Effort:

Source

I hope these frameworks are helpful for you to think through your priorities!

What does Analytics People Manager spend their time on?

Standard

Few folks recently asked me on where do I allocate my team as a people manager of a double-digit (10+) analytics (data engineers, BI engineers, data science) team at Amazon. There are 5 buckets and the allocation varies week to week depending on priorities:

  1. People management activities: This bucket includes tasks where you work backwards from keeping and growing the folks on your team. E.g. weekly 1:1’s, Team meetings, Career growth sync’s, Promotion process, etc.
  2. Hiring & team building activities: This bucket includes tasks related to hiring new folks or backfilling existing roles on your team. This also includes tasks to continue to have a culture & structure on your team that other folks would love to be part of.
  3. Partnerships and stakeholder management: This bucket include tasks to build partnerships and trust with teams that your team’s success depends on. This also includes proactively managing relationships and obsessing over the needs of our stakeholders by meeting them, being in org-level forums, etc.
  4. Building & driving tech vision: This bucket includes a) forming a tech vision for your team that you can work backwards from b) putting mechanisms to drive towards that vision and empowering the team to be effective and efficient. This could take various forms but roadmap, annual planning, head-count planning, prioritization mechanisms, tech architecture reviews, etc. are part of this.
  5. Strategic Initiatives: I always have some or the other strategic initiative going on. These are initiatives that will have a step-change improvement for company, org and the team.

These buckets can be further broken down into people (people management, hiring), process (partnerships, stakeholder management) and platform (tech vision, strategic initiatives) and maps very well to mental model to building data driven companies which I have written about before here: https://insightextractor.com/2015/12/29/building-data-driven-companies-3-ps-framework/

Hope that helps and I would love your feedback, comments below!

All things data newsletter #16

Standard

(if this newsletter was forwarded to you then you can subscribe here: https://insightextractor.com/)

The goal of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following 5 articles/videos made the cut for today’s newsletter.

(1) Data & AI landscape 2020

Really good review of the yera 2020 of data & AI landscape. Look at those logos that represent bunch of companies tackling various data and AI challenges — it’s an exciting time to be in data! Read here 

2020 Data and AI Landscape
Image Source

(2) Self-Service Analytics

Tooling is the east part, it’s the follow-up steps needed to truly achieve a culture that is independently data-drive. Read here

(3) What is the difference between data pipeline and ETL?

Really good back-to-basics video on difference between Data pipeline and ETL.

(4) Delivering High Quality Analytics at Netlfix

I loved this video! It talks about how to ensure data quality throughout your data stack.

(5) Introduction of data lakes and analytics on AWS

I have another great Youtube video for you. This one introduces you to various AWS tools on data and analytics.

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

Three stories about my managerial journey published on Plato

Standard

I wanted to share 3 stories that Plato (engineering leadership mentorship platform) recently published about my managerial journey. It captures some learnings in career growth, productivity, team process and sharing the team vision. Links below.

(1) How to drive a team vision as First-time manager?

Paras recalls how he successfully drove a team vision as a first-time manager who too over a team without the vision and roadmap. Read here: https://www.platohq.com/resources/how-to-drive-a-team-vision-as-a-first-time-manager-1086187395

(2) How to unlock the potential of your average engineer?

Paras discusses how to unlock the pontential of an average performing engineer and encourage them to be more proactive and autonomous. Read here: https://www.platohq.com/resources/how-to-unlock-the-potential-of-your-average-engineer-658984829

(3) Managing career growh of your reports?

Paras tells how he approaches career growth of his reports by dedicating time to exclusively talk about their careers. Read here: https://www.platohq.com/resources/managing-career-growth-of-your-reports-1306933855

All things Data Newsletter #15 (#dataengineering #datascience #data #analytics)

Standard

(if this newsletter was forwarded to you then you can subscribe here: https://insightextractor.com/)

The goal of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following 5 articles made the cut for today’s newsletter.

(1) Scaling data

Fantastic article by Crystal Widjaja on scaling data. It shares a really good framework for building analytics maturity and how to think about building capabilities to navigate each stage. Must read! Here

three stages.png
Image Source: reforge

(2) Building startup’s data infrastructure in 1-Hour

Good video that touches multiple tools. Watch here: https://www.youtube.com/watch?v=WOSrRTaNIm0 (it’s a little outdated since it was shared in 2019 which is 2 years ago but the architecture is still helpful)

(3) Analytics lesson learned

If you haven’t read lean analytics, I recommed it! After that, you should read this free companion which covers 12 good analytics case studies. Read here

(4) Organizing data teams

How do you organize data teams? completely centralized under a data leader? or do you structure it de-centralized reporting into leaders of business functions? some good thoughts here

Image Source

(5) Metrics layer is a missing piece in modern data stack

This is a good article that encourages you to think about adding metrics layer in your data stack. In the last newseltter, I also shared an Article that talks about Airbbn’s Minerva metrics layer and this article does a good job of providing additional reasons to build something simiar. Read here

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

All things Data Newsletter #14 (#dataengineering #datascience #data #analytics)

Standard

(if this newsletter was forwarded to you then you can subscribe here: https://insightextractor.com/)

The goal of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following 5 articles made the cut for today’s newsletter.

(1) Analytics is a mess

Fantastic article highligting the importance of the “creative” side of analytics. It’s not always structured and that is also what makes it fun. Read here

(2) Achieving metric consistency at Scale — Airbnb Data

This is a great case study shared by Airbnb’s data team on how they achived metrics consistency at Scale. Read here

Image Source

(3) Achieving metric consistency & standardization — Uber Data

Another great read on metrics standardization — this time at Uber. As you can notice it’s a recurring problem at different organizations after hitting a certain growth threshold. This problem occurs since in the intial growth stage, there’s a lot of focus on enabling folks to look at metrics in a manner that’s optimized for speed. After a certain stage, this needs to balanced with consistency where the teams might have gone in different direction and they are defining the same thing in different way but that doesn’t scale anymore since you need some consistency and standardization. This is where the topic of metric consistency and standardization comes in. It’s a problem worth solving — and if you are interested, please read this article here

(4) Where is data engineering going in 5 years?

A good short post by Zach Wilson on LinkedIn talking about where data engineering is going over the next few years. Not surprised to see Data privacy in there! Read others here

(5) 3 foundations of successful data science team

An Amazon leader (Elvis Dieguez) talks about the 3 foundational pillards of a successful data team. This is comprised of 1) data warehouse 2) automated data pipelines 3) self-service analytics tool. Read here

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

All things data newsletter #13 (#dataengineer #datascience)

Standard

(if this newsletter was forwarded to you then you can subscribe here: https://insightextractor.com/)

The goal of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following 5 articles made the cut for today’s newsletter.

(1) The Modern Data Stack

Amazing artcile by Tristan Handy explaining modern data stack. If you are familiar with tools such as Looker, Redshift, Snowflake, BigQuery, FiveTran, DBT, etc but wondered how each of them fit into an overall architecture, then this is a must read! Read here

Image Source: GetDBT by Tristan Handy

(2) How can data engineering teams be productive?

Good mental model and tips to build a productive data engineering team. Read here

(3) Why is future of Business Intelligence open source?

From the founder of Apache superset on why he beleives that the future of BI is open source? Read here.

(This is also a great marketing pitch for Apache Superset so please read this with a grain of salt and be aware about author’s bias on this topic)

(4) How Data and Design can work better together?

Diagnose with data and Treat with Design. Great artcile by the Julie Zhuo here

(5) Zach wilson believes that standups can be less productive in data engineering teams compared to software engineering teams

Interesting observations on his LinkedIn thread here

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

Framework for onboarding as an Analytics leader in a new team:

Standard
1. Start with people:

On a new team, start with meeting people. This includes your team, stakeholders and cross-functional partners. Ask them about the company, product, team, help they need and seek advice. Understand the career growth plans for every member of your team.

2. Understand product/company:

Read docs. Ask questions (lots of them). Attend cross-functional meetings. Try out the product yourself. Dig deeper to understand goals and success metrics of the products and company. Recommend creating an shared live doc where you invite other folks to add their comments & suggestions.

3. Build out team vision and roadmap:

Document customer pain points. Map that against the projects that your team is executing. Learn about the top successes and misses. Articulate team vision. Build a roadmap. Iterate with partners and get alignment with leadership.

4. Focus on Impact:

Identify projects in the first 90 days that will deliver impact early. Stay focused on long term vision and impact. Keep learning. Get alignment with the leadership on how success will be measured. Roll up your sleeves and start delivering what the team & customers needs most.

#analytics#leadership#data#team

Originally posted on LinkedIn here.

All things data newsletter #12 (#dataengineer #datascience)

Standard

(if this newsletter was forwarded to you then you can subscribe here: https://insightextractor.com/)

The goal of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following 5 articles made the cut for today’s newsletter.

Why dropbox picked Apache superset as data exploration tool?

Apache superset is gaining momentum and if you want to understand the reasons behind that, you can start by reading this article here

Growth: Adjacent User Theory

I love the framing via this LinkedIn post here where Nimit Jain says that Great Growth PM output looks like “We discovered 2 new user segments who are struggling to proceed at 2 key steps in the funnel and simplified the product for them via A/B experiments. This lead to conversion improvement of 5-10% at these steps so far. We are now working to figure the next segment of users to focus on.”; you can read about the Adjacent user theory here

SQL window functions

Need intro to SQL window functions? Read this

Luigi vs Airflow

Really good matrix on comparing 2 popular ETL workflow platforms. Read here

A data engineer’s point of view on data democratization

If more people can easily access data that was previously not accessible to them then that’s a good thing. This is a good read on various things to consider, read here

Apache Superset growth within Dropbox:

superset adoption data graphics
Image Source: Dropbox Tech Blog

Thanks for reading! Now it’s your turn: Which article did you love the most and why?