The goal of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following 5 articles made the cut for today’s newsletter.
(1) Scaling data
Fantastic article by Crystal Widjaja on scaling data. It shares a really good framework for building analytics maturity and how to think about building capabilities to navigate each stage. Must read! Here
(2) Building startup’s data infrastructure in 1-Hour
Good video that touches multiple tools. Watch here: https://www.youtube.com/watch?v=WOSrRTaNIm0 (it’s a little outdated since it was shared in 2019 which is 2 years ago but the architecture is still helpful)
(3) Analytics lesson learned
If you haven’t read lean analytics, I recommed it! After that, you should read this free companion which covers 12 good analytics case studies. Read here
(4) Organizing data teams
How do you organize data teams? completely centralized under a data leader? or do you structure it de-centralized reporting into leaders of business functions? some good thoughts here
(5) Metrics layer is a missing piece in modern data stack
This is a good article that encourages you to think about adding metrics layer in your data stack. In the last newseltter, I also shared an Article that talks about Airbbn’s Minerva metrics layer and this article does a good job of providing additional reasons to build something simiar. Read here
Thanks for reading! Now it’s your turn: Which article did you love the most and why?
The goal of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following 5 articles made the cut for today’s newsletter.
1. Architecture for Telemetry data
A good reminder that the software development architecture can be significantly simplified for capturing telemetry data here
2. 5 popular job titles for data engineers
This post here lists 5 popular job titles: data engineer, data architect, data warehouse engineer — I think Analytics engineer is missing in that list but a good post nonetheless. I hope that we get some consolidation and standardization of these job titles over the next few cycles.
3. [Podcast] startup growth strategy and building Gojek data team – Crystal Widjaja
A must-read technical whitepaper from legendary Hadley Wickham. These principles form the foundation on top of which R software gained a lot of momentum for adoption. Python community uses similar tenets. Must read! here and here
5. Magic metrics that startup probably as product/market fit from Andrew Chen
A must-follow Growth leader!
Cohort Retention curves flatten (stickiness)
Actives/Reg > 25% (validates TAM)
power user curve showing a smile
Magic metrics indicating a startup probably has product/market fit:
1) cohort retention curves that flatten (stickiness) 2) actives/reg > 25% (validates TAM) 3) power user curve showing a smile — with a big concentration of engaged users (you grow out from this strong core)
Think of “continuum” as something you start and you never stop improving upon. In my mind, Business Analytics Continuum is continuous investment of resources to take business analytics capabilities to next level. So what are these levels?
Here are the visual representation of the concept:
First, let’s define what is culture: “The set of shared values, goals, and practices that characterizes a group of people” Source
Now building on top of that for defining data culture, What are set of shared values? Decisions will be made based on insights generated through data. And also, group of people represent all decision makers in the organization. So in other words:
An org that has a great data culture will have a group of decision makers that uses data & insights to make decisions.
Why is building data
culture important?
There are two ways to make decisions: one that uses data and one that doesn’t. My hypothesis is that decisions made through data are less wrong. To make this happen in your org, you need to have a plan. In the sections below, i’ll share key ingredients and mental model to build a data culture.
What are the ingredients
for a successful data culture?
An interesting video that’s a great reminder on how Analytics is a game-changer when applied correctly. The video shared above how small clubs uses analytics to compete with big clubs and continue to not only stay relevant but grow in the process.
Similar analogy can be drawn for startups (or early-mid stage products inside big companies) where they can use Analytics to compete with incumbents in the market.
Let me know what you think. What’s your favorite analogy to help explain why analytics is useful to your org?
These posts were perfectly timed for me as we started thinking about Annual Planning for Alexa Voice Shopping org (Amazon) this week. As a part of my research of which metrics to use to measure things that our business cares most about and then setting the right benchmarks/goals for the org, the posts below were super helpful. So if you are in tech and if you care about 1) measuring frequency of usage 2) measuring the most engaged cohort then you should take some time to read these posts.
I am honored to get the PASS outstanding volunteer award again for June 2017! It’s been so much fun helping grow the chapter from 1K to 10K members within last 4 years — the PASS HQ Team & Dan English (Group Lead) were great to work with and there’s so much more growth left for the next few years! The Group was recently classified as a “tier-1” group and got new sponsors which mean that group has some funding to pursue paid growth opportunities that weren’t accessible before.
So since the group has the perfect platform to continue growing and we have a really good process in place to keep our growth flywheel running, I figured it’s a great time to step down. Over the past few years, my career moved me from Business Intelligence -> Analytics -> Data Science and along with that, I have slowly moved away from Microsoft-centric architectures too. I started out working for a Microsoft Gold Partner and then worked for an Open-source heavy shop at a startup-mode organization in silicon valley and now I work in an organization that uses a little bit of everything. Something like best of both worlds — and so there’s a much bigger gap now between where my career is taking me and the mission of the business analytics virtual group. They don’t perfectly align anymore and even though it’s a very rewarding experience, after some reflection, I figured the group deserves a leader whose mission aligns better than mine does.
— PASS Community Team (@Community_PASS) July 5, 2017
And there’s an open position for new volunteers on the Virtual group and so if you like to be involved, reach out to Dan English through the group’s website: http://bavc.pass.org/
If you are a data science professional and haven’t heard about bots, you will soon! Most of the big vendors (Microsoft, Qlik, etc) have started adding capabilities and have shown some signs of serious product investments for this category. So, let’s step back and reflect how will bot impact the adoption of data platforms? and why you should care?
So, let’s start with this question: What do you need to drive a data-driven culture in an organization? You need to focus on three areas to be successful:
Data (you need to access from multiple sources, merge/join it, clean it and store it in cental location)
Modeling Layer/Algorithm layer (you need to add business logic, transform data and/or add machine learning algorithm to add business value to your data)
Workflow (you need to embed data & insights in business user’s workflow OR help provide data/insights when they in their decision-making process)
Over the past few years, there was a really strong push for “self-service” which was good for the data professionals. A data team builds a platform for analysts and business users to self-serve whenever they needed data and so instead of focusing on one-off requests, the team could focus on continuously growing the central data platform and help satisfy a lot of requests. This is all great. Any business with more than 50-ish employees should have a self-service platform and if they don’t then consider building something like that. All the jazz comes after this! Data Science, Machine learning, Predictive modeling etc would be much easier if you have a solid data platform (aka data warehouse, operational data store) in place! Of course, I am talking at a pretty high-level and there are nuances and details that we could go into but self-service were meant for business users and power users to “self-serve” their data needs which is great!
Now, there is one problem with that! Self-service platforms don’t do a great job at the third piece which is “workflow” — they are not embedded in every business user’s workflow and management team doesn’t always get the insights when they need to make the decision. Think of it this way, since it’s self-serving platform, users will think of it to react to business problems and might not have the chance to be pro-active.Ok, That may seem vague but let me give you an example.
Let’s a take a simple business workflow of a sales professional.
She has a call coming up with one of her key customers since their account is about to expire. So she logs into the CRM (customer relationship management) software to learn about the customer. She looks at some information in the CRM system and then wants to learn about the product usage by that customer over last 12 months.
She opens a new browser tab and logs into the data platform. Takes about 10 minutes to navigate to data model/app that has that information. Filters the data to the customer of interest and a chart comes up.
Goes back to the CRM system. Needs something else so goes back to the data platform. That searching takes another 10 minutes!
Wasn’t that painful? Having to switch between multiple applications and wasting 10 minutes each time just to answer a simple question. So business users do this if this is critical but they will ignore your platform if it’s not business-critical.
So to improve data-driven culture you need to think about your business users workflow and think of ways to integrate data/insights. This is probably one of the most under-rated things that has exponential pay-off’s!
So how do bots fit into all of this? So we talked about how workflows are important, right? To address this, tools had data alerts and embedded reports feature which works too but now we have a new thing called “bots” which enables deeper integration and helps you embed data/insights to a business user’s workflow.
Imagine this: In the previous example, instead of logging into data platform, the business user could just ask a question on one of the chat applications: show me the product usage of customer x. And a chart shows up. Boom! Saved 10 minutes but more importantly, by removing friction and adding delight, we gained a loyal user who is going to be more data-driven than ever before!
This is not fiction! Here’s a slack bot that a vendor built that does what I just talked about:
So to wrap up, I think bots could have a tremendous impact on the adoption of the data platforms as it enables data professionals to work on the third pillar called “workflow” to further empower the business users.
And the increase in data consumption is great for both data engineers and data scientists. it’s great for data engineers because people might ask more questions and you might have to integrate more data sources. It’s great for data scientists because if more people ask questions then over time, they will get to asking bigger and bolder questions and you will be looped into those projects to help solve those.
What do you think? Do you think bot will impact the adoption of data platforms? If so, how? if not, why not? I am looking forward to hearing about what you have to say! please add your comments below.