All things data newsletter #11 (#dataengineer, #datascience)

Standard

(if this newsletter was forwarded to you then you can subscribe here: https://insightextractor.com/)

The goal of this newsletter is to promote continuous learning for data science and engineering professionals. To achieve this goal, I’ll be sharing articles across various sources that I found interesting. The following 5 articles made the cut for today’s newsletter.

1. AWS re:Invent ML, Data and Analytics announcements

Really good recap of all ML, Data and Analytics announcements at AWS reinvent 2020 here

2. How to build production workflow with SQL modeling

A really good example of how a data engineering at Shopify applied software engineering best practices to analytics code. Read here

Image Source

3. Back to basics: What are different data pipeline components and types?

Must know basic concepts for every data engineer here

4. Back to basics: SQL window functions

I was interviewing a senior candidate earlier this week and it was unfortunate to basic mistakes while writing SQL window functions. Don’t let that happen to you. Good tutorial here

5. 300+ data science interview questions

Good library of data science interview questions and answers

Thanks for reading! Now it’s your turn: Which article did you love the most and why?

Five Tenets for effective data visualization

Standard

Tenet is a principle honored by a group of a people. As a reader of this blog, you work with data and data visualization is an important element in your day-to-day work. So, to help you build effective data visualization, I created the tenets below which are simple to follow. This work is based on multiple sources and I’ll reference it below.

Five Tenets for effective data visualization:

  1. We will strive to understand customer needs
  2. We will tell the truth
  3. We will bias for simplicity
  4. We will pick the right chart
  5. We will select colors strategically

Examples for each tenet is listed below:

  • We will strive to understand customer needs

Defining and knowing your audience is very important before diving into the other tenets. Doing this will increase your probability of delivering an effective data visualization.

h/t to Mike Rubin for suggesting this over on LinkedIn here

  • We will tell the Truth

We won’t be dishonest with data. See an example below where Fox news deliberately started the bar chart y-axis at a non-zero number to make the delta look way higher than it actually is.

Source: Link

  • We will bias for Simplicity

3-D charts increase complexity for the end-users. So we won’t use something like this and instead opt for simplicity.

  • We will pick the right chart

I have linked some resources here

  • We will select colors strategically

Source here

Conclusion:

In this post, I shared five tenets that will help you build effective data visualization.

Data Culture Mental Model.

Standard

What is Data Culture?

First, let’s define what is culture: “The set of shared values, goals, and practices that characterizes a group of people” Source

Now building on top of that for defining data culture, What are set of shared values? Decisions will be made based on insights generated through data. And also, group of people represent all decision makers in the organization. So in other words:

An org that has a great data culture will have a group of decision makers that uses data & insights to make decisions.

Why is building data culture important?

There are two ways to make decisions: one that uses data and one that doesn’t. My hypothesis is that decisions made through data are less wrong. To make this happen in your org, you need to have a plan. In the sections below, i’ll share key ingredients and mental model to build a data culture.

What are the ingredients for a successful data culture?

It’s 3 P’s: Platform, Process and People and continuously iterating and improving each of the P’s to improve data culture.

How to build data culture?

Here’s a mental model for a leader within an org:

  1. Understand data needs and prioritize
  2. Hire the right people
  3. Set team goals and define success
  4. Build something that people use
  5. Iterate on the data product and make it better
  6. Launch and communicate broadly
  7. Provide Training & Support
  8. Celebrate wins and communicate progress against goals
  9. Continue to build and identify next set of data needs

Disclaimer: The opinions are my own and don’t represent my employer’s view.

Two great posts on DAU/MAU and Measuring Power Users

Standard

Two great posts from Andrew Chen. Links below:

These posts were perfectly timed for me as we started thinking about Annual Planning for Alexa Voice Shopping org (Amazon) this week. As a part of my research of which metrics to use to measure things that our business cares most about and then setting the right benchmarks/goals for the org, the posts below were super helpful. So if you are in tech and if you care about 1) measuring frequency of usage 2) measuring the most engaged cohort then you should take some time to read these posts.

Power user curve 

DAU/MAU is an important metric to measure engagement, but here’s where it fails

Cheers!

Excel 2013: Display hidden rows columns and data on an Excel Chart

Standard

If you hide rows, columns and data on excel, the chart that’s uses this data also hides it — while this is the default behavior, you can override this by following the steps below.

Let’s reproduce the behavior first.

I have a simple excel chart like shown below:

Excel Chart Hide Data 1Now, if I hide the data that is selected for this chart then the chart stops showing this as well:

Excel Chart Hide Data 2To fix this and if you want the cells (rows, columns and data) to be still hidden but still have the chart show up, then follow the steps:

  1. Select the chart
  2. Under Chart Tools > Design > Select Data Excel Chart Hide Data 3
  3. Click on Hidden and Empty Cells Excel Chart Hide Data 4
  4. Check the Show data in hidden rows and columns check-boxExcel Chart Hide Data 5
  5. Go back to excel and you should see the data on the chart now even though the data is hidden Excel Chart Hide Data 6

Hope that helps!

#PowerBI idea: Enable #PowerQuery Excel add-in for Mac/Apple/iOS

Standard

As a data professional, you would invariably end up spending a lot of time on data cleaning & transformation and a lot of times, you might be doing your work in Excel — if so, then check out Power Query if you haven’t already! It will save you a LOT of time and unlock Jedi powers that you didn’t know you had!

BUT…

if you are using a Mac — and there’s a lot of data scientist and data analyst who are on this platform then you are unfortunately out of luck! So for Mac users out there, I had shared this feedback which has 50 comments & 337 votes (as of 6/16/17) on the official Power BI ideas site; If you are one of the Mac users, then I encourage you to check it out and vote! Microsoft does take it seriously and their roadmap is heavily influenced by ideas site.

URL: https://ideas.powerbi.com/forums/265200-power-bi-ideas/suggestions/7157571-enable-power-query-excel-add-in-for-mac-apple-ios

Power Query Excel Microsoft

 

How will bots impact the adoption of data platforms?

data bots
Standard

If you are a data science professional and haven’t heard about bots, you will soon! Most of the big vendors (Microsoft, Qlik, etc) have started adding capabilities and have shown some signs of serious product investments for this category. So, let’s step back and reflect how will bot impact the adoption of data platforms? and why you should care?

So, let’s start with this question: What do you need to drive a data-driven culture in an organization? You need to focus on three areas to be successful:

  1. Data (you need to access from multiple sources, merge/join it, clean it and store it in cental location)
  2. Modeling Layer/Algorithm layer (you need to add business logic, transform data and/or add machine learning algorithm to add business value to your data)
  3. Workflow (you need to embed data & insights in business user’s workflow OR help provide data/insights when they in their decision-making process)

Over the past few years, there was a really strong push for “self-service” which was good for the data professionals. A data team builds a platform for analysts and business users to self-serve whenever they needed data and so instead of focusing on one-off requests, the team could focus on continuously growing the central data platform and help satisfy a lot of requests. This is all great. Any business with more than 50-ish employees should have a self-service platform and if they don’t then consider building something like that. All the jazz comes after this! Data Science, Machine learning, Predictive modeling etc would be much easier if you have a solid data platform (aka data warehouse, operational data store) in place! Of course, I am talking at a pretty high-level and there are nuances and details that we could go into but self-service were meant for business users and power users to “self-serve” their data needs which is great!

Now, there is one problem with that! Self-service platforms don’t do a great job at the third piece which is “workflow” — they are not embedded in every business user’s workflow and management team doesn’t always get the insights when they need to make the decision. Think of it this way, since it’s self-serving platform, users will think of it to react to business problems and might not have the chance to be pro-active.Ok, That may seem vague but let me give you an example.

Let’s a take a simple business workflow of a sales professional.

  1. She has a call coming up with one of her key customers since their account is about to expire. So she logs into the CRM (customer relationship management) software to learn about the customer. She looks at some information in the CRM system and then wants to learn about the product usage by that customer over last 12 months.
  2. She opens a new browser tab and logs into the data platform. Takes about 10 minutes to navigate to data model/app that has that information. Filters the data to the customer of interest and a chart comes up.
  3. Goes back to the CRM system. Needs something else so goes back to the data platform. That searching takes another 10 minutes!

Wasn’t that painful? Having to switch between multiple applications and wasting 10 minutes each time just to answer a simple question. So business users do this if this is critical but they will ignore your platform if it’s not business-critical.

So to improve data-driven culture you need to think about your business users workflow and think of ways to integrate data/insights. This is probably one of the most under-rated things that has exponential pay-off’s!

So how do bots fit into all of this? So we talked about how workflows are important, right? To address this, tools had data alerts and embedded reports feature which works too but now we have a new thing called “bots” which enables deeper integration and helps you embed data/insights to a business user’s workflow.

Imagine this: In the previous example, instead of logging into data platform, the business user could just ask a question on one of the chat applications: show me the product usage of customer x. And a chart shows up. Boom! Saved 10 minutes but more importantly, by removing friction and adding delight, we gained a loyal user who is going to be more data-driven than ever before!

This is not fiction! Here’s a slack bot that a vendor built that does what I just talked about:

Product Usage BotsSo to wrap up, I think bots could have a tremendous impact on the adoption of the data platforms as it enables data professionals to work on the third pillar called “workflow” to further empower the business users.

And the increase in data consumption is great for both data engineers and data scientists. it’s great for data engineers because people might ask more questions and you might have to integrate more data sources. It’s great for data scientists because if more people ask questions then over time, they will get to asking bigger and bolder questions and you will be looped into those projects to help solve those.

What do you think? Do you think bot will impact the adoption of data platforms? If so, how? if not, why not? I am looking forward to hearing about what you have to say! please add your comments below.

-Paras Doshi

Is it too late to become a good Data Scientist?

Standard

If you’re looking for career change, that’s never too late!

If you’re looking to learn something new, that’s never too late!

If you’re looking to continue learning and go deeper in data science, that’s never too late!

If you don’t like Software engineering and want to switch to something else, that’s never too late!

But if you are after the “Data Science” gold rush, then you did miss the first wave! You are late.

But seriously, you should apply first-principles thinking to your career strategy and ideally not jump to whatever’s “hot” because by the time you get on that train, it’s usually too late.

VIEW QUESTION ON QUORA

As a data scientist, are you dissatisfied with your career? Why?

Standard

As a data scientist, I am not dissatisfied. I love what I do!

But I might have gotten lucky since I got into this for the right reasons. I was looking for a role that had a little bit of both tech & business and so few years back, Business Intelligence and Data Analysis seemed like a great place to start. So I did that for a while. Then industry evolved and the analytics maturity of the companies that I worked also evolved and so worked on building predictive models and became what they now call “Data scientist”.

It doesn’t mean that data science is the right role for everyone.

One of my friends feels that it’s not that “technical” and doesn’t like this role. He is more than happy with data engineer role where he gets to build stuff and dive deeper into technologies.

One of my other friends doesn’t like that you don’t own business/product outcomes and prefers a product manager role (even though he has worked as a data analyst for a while now and is working on transitioning away).

So, just based on the empirical data that I have, data science might not be an ideal path for everyone.

Hope that helps!

VIEW QUESTION ON QUORA

How do you become a good data analyst?

Standard

This was asked on quora and here’s my reply:

You can become a great data analyst by continuously improving the analytics maturity of the company/start-up that you work for:

[Go to my blog for more context on the picture above]

If you create bunch of reports and help answer what happened— then try to help business users with why it happened. [Example: Instead of just sending website traffic info, add why the traffic spikes (up/downs) are happening]

If you are working on building bunch of models that answer why questions then try to help build predictive models next [Example: You have been working on a model that helped you answer why customers churned. Now built upon that and predict which customers will churn next]

If you do analytics and data science well and are already answering what, why, what’s next questions and you’re killing it! Then figure out how can you help business owners take action. Or make it easier than ever before to take actions on your data/recommendations.


Other answers for questions are directly/indirectly covered if you do this:

  1. You will have to pick the right tool for the job
  2. You will have to continuously keep learning (by taking online courses and/or you-tube)
  3. Don’t just be a data analyst, be a thought partner to business owners and if possible, transition into role that help you own business outcomes.

Hope that helps!

VIEW QUESTION ON QUORA