All things data newsletter #13 (#dataengineer #datascience)

Standard

(if this newsletter was forwarded to you then you can subscribe here: https://insightextractor.com/)

The goal of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following 5 articles made the cut for today’s newsletter.

(1) The Modern Data Stack

Amazing artcile by Tristan Handy explaining modern data stack. If you are familiar with tools such as Looker, Redshift, Snowflake, BigQuery, FiveTran, DBT, etc but wondered how each of them fit into an overall architecture, then this is a must read! Read here

Image Source: GetDBT by Tristan Handy

(2) How can data engineering teams be productive?

Good mental model and tips to build a productive data engineering team. Read here

(3) Why is future of Business Intelligence open source?

From the founder of Apache superset on why he beleives that the future of BI is open source? Read here.

(This is also a great marketing pitch for Apache Superset so please read this with a grain of salt and be aware about author’s bias on this topic)

(4) How Data and Design can work better together?

Diagnose with data and Treat with Design. Great artcile by the Julie Zhuo here

(5) Zach wilson believes that standups can be less productive in data engineering teams compared to software engineering teams

Interesting observations on his LinkedIn thread here

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

All things data newsletter #10 (#dataengineer #datascience)

Standard

(if this newsletter was forwarded to you then you can subscribe here: https://insightextractor.com/)

The goal of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following 5 articles made the cut for today’s newsletter.

1. Architecture for Telemetry data

A good reminder that the software development architecture can be significantly simplified for capturing telemetry data here

2. 5 popular job titles for data engineers

This post here lists 5 popular job titles: data engineer, data architect, data warehouse engineer — I think Analytics engineer is missing in that list but a good post nonetheless. I hope that we get some consolidation and standardization of these job titles over the next few cycles.

3. [Podcast] startup growth strategy and building Gojek data team – Crystal Widjaja

Really good podcast, highly recommended! here

4. Tenets for data cleaning

A must-read technical whitepaper from legendary Hadley Wickham. These principles form the foundation on top of which R software gained a lot of momentum for adoption. Python community uses similar tenets. Must read! here and here

5. Magic metrics that startup probably as product/market fit from Andrew Chen

A must-follow Growth leader!

  1. Cohort Retention curves flatten (stickiness)
  2. Actives/Reg > 25% (validates TAM)
  3. power user curve showing a smile

TelemetryTiers
Image Source

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

Four Tenets for effective Metrics Design

Standard

The goal of this blog post is to provide four tenets for effective metrics design.

Four Tenets for effective Metrics Design

What is a tenet?

Tenet is a principle honored by a group of a people.

Why is effective metrics design important?

Metrics help with business decision-making. Picking the right metric increases the odds of decision making through data vs gut/intuition which can be a difference between success & failure.

Four Tenets for effective metrics design:

  1. We will prioritize quality over quantity of metrics: Prioritizing quality over quantity is important because if there are multiple metrics that teams are tracking then it’s hard for decision-makers to swarm on areas that are most important. Also having multiple metrics decreases the odds of each metric meeting the bar for quality. Now if you have few metrics that are well thought out and meets the other tenets that are listed in the post, it will increase the odds of having a solid data driven culture. I am not being prescriptive with what’s a good number of metrics you should have but you should definitely experiment and figure that out — however, I can give you a range: Anything less than 3 key metrics might be too less and more than 15 is a sign that need to trim down the list.
  2. We will design metrics that are behavior changing (aka actionable): A litmus test for this that ask your business decision-markers to articulate what they will do if the metric 1) goes up N% (let’s say 5%) 2) stays flat 3) goes down N% — they should have a clear answer for at least two out of three scenario’s above and if they can’t map a behavior change or action then this metric is not as important as you think. This is a sign that you can cut this metric from your “must-have” metrics list. This doesn’t mean that you don’t track it but it gives you a framework to prioritize other metrics over this or iterate your metric design till you can define this metric such that it is behavior changing.
  3. We will design metrics that are easy to understand: If your metrics are hard to understand then it’s harder to take actions from it and so it’s a pre-requisite for making your metrics that are behavior changing. Also, other than increasing your odd for the metrics being actionable, you are also making the metric appeal to a wider audience in your teams instead of just focusing on key business decision makers. Having a wide group of people understand your metrics is key to having a solid data driven culture.
  4. We will design metrics that are easy to compare: Metrics that are easy to compare across time-periods, customer segments & other business constructs help make it easy to understand and actionable. For e.g. If I tell you that we have 1000 paying customer last week and this week, that doesn’t give you enough signal whether it’s good or bad. But if I share that last week our conversion rate was 2.3% and this week our conversion rate is 2.1% then you know that something needs to be fixed on your conversion funnel given a 20 bps drop. Note that the ratios/rate are so easy to compare so one tactical tip that I have for you is that to make your metrics easy to compare, see if a ratio/rate makes sense in your case. Also, if your metrics are easy to compare then that increases the odds of it being behavior changing just like what i showed you through the example.

Conclusion:

In this blog post, you learned about effective metric design.

What are your tips for picking good metrics? Would love to hear your thoughts!

What analytics data gives you the most actionable advice to improve your blog?

Standard

Someone asked on Quora: What analytics data gives you the most actionable advice to improve your blog? so here’s my answer:

I have been blogging about Analytics for past few years and this question is at the intersection of both so let me give it a shot:

It depends on two things: 1) Your goal for running the blog 2) Age of the blog

#1: Your goal

why blog analytics

First let’s talk about your goal for running the blog. It’s important to define this as this would help set the metrics that you will monitor and take actions to improve it.

Let’s say that the goal of your blog is to earn is to monetize using ads. So your key performance indicator (KPI) will be monthly ad revenue. In that case you can improve by one of the three things: Number of People visiting the blog x % of visitors clicking on ads x average revenue per ad click. You can work on marketing your blog to increase number of people visiting the blog. Then you can work on ad placement on your blog to increase % of visitors clicking the ad and then you can work on trying different ad networks to see which one pays you the most per click.

let’s take one more example. Like me if your goal is to use your blog for “exposure” which helps me build credibility in the field that I work in. In this case, the KPI i look at is Monthly New Visitors. I drill down further to see which marketing channels are driving that change. That helps me identify channels that I can double down on and reduce investments in other areas. For example: I found that Social is not performing that great but Search has been working great — I started investing time in following SEO principles and spent less time on posting on social.

So first step: Define your goal and your KPI needs to align with that.

#2: Age of your blog:

  • Early: Now at this stage, you will need to explore whether you can achieve what you set out to using blogging. So let’s say you wanted to earn money online. In first few weeks/months, you need to figure out if it’s possible. Can you get enough traffic to earn what you wanted? yes? Great! If not, blogging might not be the answer and eventually all your energy is being wasted. Figure this out sooner rather than later — and take first few weeks/months to make sure blogging helps you achieve your goal.
  • Mid: By this stage, you should know how blogging is helping you achieve your goal. So it’s time to pick one metric that matters! So if your goal was to earn money using ads then go for Monthly ad revenue and set up systems to track this. Google Analytics will be a great starting point. Also, at this stage, you should be asking for qualitative feedback. Ask your friends, ask on social, get comments, do guest blogging on popular platforms and see if you get engagement — basically focus on qualitative feedback since you won’t have enough visitors that you can analyze quantitative data.
  • Late: In this stage, you have the data and the blog is starting to get momentum. Don’t stop qualitative feedback loops but now start looking at quantitative data too. Figure out the underlying driving forces that move the needle on your KPI. Focus on improving those!

TL;DR: Define your “why” and then pick a metric— then use combination of qualitative and quantitative data to improve the underlying driving factors to improve the metric.

VIEW ON QUORA

What are some of the most important resources a Data analyst needs to know about?

Standard

This question was asked on Quora and here’s my answer:

I will list resources broken down by three categories.

  1. Business Knowledge: As a data analyst, you need to have at least basic knowledge of business areas that you are helping with. For example: if you are doing Marketing Analytics then you need to understand basic concepts in marketing and that will make you more effective. You can do so one of the three ways:
    • On-the-job: Pick up knowledge by interacting with business people and using internal knowledge bases.
    • Online resources: Pick up basics of marketing by taking a beginners course online on a platform like Coursera OR from resources like this: Business Concepts – Bootcamp | PrepLounge.com
    • College/University: If you are at a college/university then you can either audit a course or depending on your major/minor, core business courses might just be part of the curriculum
  2. Communication skills:
    • Public Speaking: Toastmaster’s is a great resource. if you don’t have access to a local Toastmasters club, you should be able to find a course online. Check out Coursera.
    • Data Storytelling: Just listening to someone like Hans Rosling can be very inspiring! The best stats you’ve ever seen . Also, If you search storytelling with data on YouTube, you will see few good talks: storytelling with data – YouTube
    • Problem structuring: If you are able to break down the problem into core components to identify root cause, you will not only increase your speed to insight but your structure will also help you communicate it more effectively. Learn to break down your problems and use that in communicating your data analysis approach. Imagine this list without the three high-level categories — wouldn’t it look like I am throwing random resources at you? By giving it a structure — Tech, Biz, Communication, I am not only able to structure it but also communicate it to you more effectively. More here: Structure your Thoughts – Bootcamp | PrepLounge.com
  3. Tech skills: Read Akash Dugam’s answer: Akash Dugam’s answer to What are some of the most important resources a Data analyst needs to know about? — it’s a nice list. Also, check this out: Learn #Data Analysis online – free curriculum

A great data analyst will focus on all areas and a good data analyst might just focus on tech. Hope that helps!

VIEW QUESTION ON QUORA

Five actions that you can take if you measure your analytics/business-intelligence solution usage:

Standard

Summary:

In this post, I am going to share five actions that you can take you if measure your analytics/business-intelligence solution usage:

Five actions!

I’ll highly encourage business stakeholders & IT managers to consider measuring the usage of their analytics/business-intelligence solutions. From a technical standpoint, it shouldn’t be a difficult problem since most of the analytics & business intelligence tools will give you user activity logs. So, what’s the benefit of measuring usage? Well, in short, it’s like “eating at your restaurant” – if you’re trying to spread culture of data driven decision-making in your organization, you need to lead by example! And one way you can achieve that is by building a tiny Business Intelligence solution that measures user activity on top of your analytics/business-intelligence solution. if you decide to build that then here are five actions that you can take based on your usage activity:

Let’s broadly classify them in two main categories: Pro-active & Reactive actions.

A. Pro-active actions:

1. Identify “Top” users and get qualitative feedback from them. Understand why they find it valuable & find a way to spread their story to others in the organization

2. Reach out to users who were once active users but lately haven’t logged into the system. Figure out why they stopped using the system.

3. Reach out to inactive users who have never used the system. it’s easy to find inactive users by comparing your user-list with the usage activity logs. Once you have done that, Figure out the root-cause – a. Lack of Training/Documentation b. unfriendly/hard-to-use system c. difficult to navigate; And once you have identified the root-cause, fix it!

B. Reactive actions:

4. If the usage trend if going down then alert your business stakeholders about it and find the root-cause to fix it?

Possible root causes:

– IT System Failure? Fix: make sure that problem in the system never happens again!

– Lack of documentation/Training? Fix: Increase # of training session & documentation

downward trend line chart

5. It’s a great way to prove ROI of an analytics/business-intelligence solution and it can help you secure sponsorship for your future projects!

Conclusion:

In this post, you saw five actions that you can take if you measure your usage activity of your analytics/business-intelligene solution.

I hope this was helpful! I had mentioned user training in this article and so if you want to learn a little bit more about it, here are a couple of my posts:

1. http://parasdoshi.com/2014/05/05/presented-at-sqlsat-305-dallas-ba-edition/

2. http://parasdoshi.com/2014/05/07/how-to-train-your-users-to-create-their-own-business-intelligence-reports-5-of-5-post-training/