What analytics data gives you the most actionable advice to improve your blog?

Standard

Someone asked on Quora: What analytics data gives you the most actionable advice to improve your blog? so here’s my answer:

I have been blogging about Analytics for past few years and this question is at the intersection of both so let me give it a shot:

It depends on two things: 1) Your goal for running the blog 2) Age of the blog

#1: Your goal

why blog analytics

First let’s talk about your goal for running the blog. It’s important to define this as this would help set the metrics that you will monitor and take actions to improve it.

Let’s say that the goal of your blog is to earn is to monetize using ads. So your key performance indicator (KPI) will be monthly ad revenue. In that case you can improve by one of the three things: Number of People visiting the blog x % of visitors clicking on ads x average revenue per ad click. You can work on marketing your blog to increase number of people visiting the blog. Then you can work on ad placement on your blog to increase % of visitors clicking the ad and then you can work on trying different ad networks to see which one pays you the most per click.

let’s take one more example. Like me if your goal is to use your blog for “exposure” which helps me build credibility in the field that I work in. In this case, the KPI i look at is Monthly New Visitors. I drill down further to see which marketing channels are driving that change. That helps me identify channels that I can double down on and reduce investments in other areas. For example: I found that Social is not performing that great but Search has been working great — I started investing time in following SEO principles and spent less time on posting on social.

So first step: Define your goal and your KPI needs to align with that.

#2: Age of your blog:

  • Early: Now at this stage, you will need to explore whether you can achieve what you set out to using blogging. So let’s say you wanted to earn money online. In first few weeks/months, you need to figure out if it’s possible. Can you get enough traffic to earn what you wanted? yes? Great! If not, blogging might not be the answer and eventually all your energy is being wasted. Figure this out sooner rather than later — and take first few weeks/months to make sure blogging helps you achieve your goal.
  • Mid: By this stage, you should know how blogging is helping you achieve your goal. So it’s time to pick one metric that matters! So if your goal was to earn money using ads then go for Monthly ad revenue and set up systems to track this. Google Analytics will be a great starting point. Also, at this stage, you should be asking for qualitative feedback. Ask your friends, ask on social, get comments, do guest blogging on popular platforms and see if you get engagement — basically focus on qualitative feedback since you won’t have enough visitors that you can analyze quantitative data.
  • Late: In this stage, you have the data and the blog is starting to get momentum. Don’t stop qualitative feedback loops but now start looking at quantitative data too. Figure out the underlying driving forces that move the needle on your KPI. Focus on improving those!

TL;DR: Define your “why” and then pick a metric— then use combination of qualitative and quantitative data to improve the underlying driving factors to improve the metric.

VIEW ON QUORA

Introduction to SQL Performance Tuning for Data Analysts

Standard

Introduction:

SQL is a common language used by data analysts (and even business users!) for data analysis — one of the reasons is popular is because it’s not that hard to pick it up. Sure, there is some learning curve especially if you don’t have a computer programming background but once you learn some basic commands, you will be able to apply it and answer a lot of questions. So it does give you lot of power! But sometimes you run into issues where your SQL queries are taking forever to complete and you wonder why that’s the case. In this post, I am going to introduce you to performance tuning that will help to troubleshoot next time you run into performance problems.

Performance Tuning SQL

Performance Tuning SQL

Performance tuning Hierarchy

your queries are slow due to one of the three reasons listed below:

#1: SQL Query optimization

#2: Database software, environment & optimization

#3. Hardware

You should start at Level 1 which is query optimization and then work your way down to other levels. This post will focus on SQL Query optimization as that is something you can control and it is also the most common root cause. Let’s focus on this first and then we will explore other options.

Performance Tuning SQL Queries

Depending on your skill level, you can look at a lot of things. But for the purpose of this blog post lets say you have beginner – intermediate SQL Knowledge and with that, you can look at following things:

  1. Size of your tables: If you are querying tables with millions of rows then that is going to slow down your queries. You can start off with limiting the amount of rows that you are working with using SQL clauses like LIMIT/TOP (depends on database system you are using) that will reduce the number of rows that database works with. This is great for exploratory analysis where you don’t need to look at all rows. Also, you should consider using “where” clause when it’s applicable. So let’s say you just care about a particular product category OR a particular product category then put where clauses where it’s applicable.
  2. Complex joins & aggregations: If you trying to join tables in such a way that returns a large number of rows then it’s going to be slow! If possible you can apply step 1 (limit your rows) to this as well — so let’s say you have two tables that are trying to join but you don’t need everything from table 1 then consider putting where clause on table 1. That would help. Also, if you are using aggregations along with joining tables then you could consider doing aggregations on individual tables first and put them in a subquery and then use them to join with other tables. Here’s an example where you are aggregating the table before joining them with other tables:

    SELECT cat.product_category,
    sub.*
    FROM (
    SELECT p.product_name,
    COUNT(*) AS products
    FROM Product p
    GROUP BY 1
    ) sub
    JOIN ProductCategory cat
    ON cat.product_name = sub.product_name

  3. Query plan: Also, how do you know which statement is a bottleneck in your queries. You could consider running them individually and see but even if that doesn’t help then you can use something called a “Query plan” — depending on the database system you are using the commands can differ but you can try searching the help section for that. It’s called Query/Execution plans and they help you see the order in which the query will be executed. Also, it will have “estimated” time to run stats (which may or may not be accurate) but still good starting point to see how long it might take for complex queries especially as you make changes, you can continuously evaluate without having to run the query. There’s a bit of learning curve on how to read execution/query plans but they are a great way to check the bottlenecks in a query that you wrote. You can try using a command like “EXPLAIN” before your query and check your database help section to see if that’s not the command.

Database software, Environment, Optimization & Hardware:

Let’s say you have tried everything you could to tune your SQL queries then it’s move to explore other options:

#2: Database software, environment & optimization: This is usually owned by Dev Ops or IT team and you will have to work with them. Depending on your team size, there might be a DBA, System Admin or DevOps engineer responsible for this. Here’s few things you should check out along with the IT team:

  • Are there a lot of users running queries at the same time?
  • Upgrade to database software and/or perform database optimization (e.g. indexing)
  • Consider evaluating a database that supports analytics better like Vertica, Redshift, Teradata, Google BigQuery, Microsft Azure SQL Data Warehouse among others — if you have 20+ users hitting a database for querying and have tables with 25M+ rows then this is worth evaluating! Your mileage may vary (depending on your hardware) but I am sharing these thresholds just so you have a starting point. These databases are different in architecture compared to something like MySQL which will bog down as you start to scale analytics in your org.
  • Are you querying a production database that also used to support other apps? If so, consider requesting a copy of a database to work with. IT should be able to set up a copy that gets refreshed let’s say nightly and that should help. You should then route all SQL users to this database and restrict access to production database.

#3: Hardware:

Since database is a software, it is constrained by the resources that it is allocated at hardware level just like any other software. You should dive deeper into this if #1 & #2 don’t work out — This is not the most common root-cause but as a rule of thumb, you should be scaling your hardware resources as other systems are scaled too. if that’s not done regularly then you will hit hardware issues. Also, don’t just upgrade your hardware, as I referred to earlier in #2, consider looking into databases that are better for analytics like vertica, redshift, bigquery etc. Compare all options & do an ROI analysis as upgrading hardware is usually a “duct-tape” solution and you will run into it again if you continue to grow.

Conclusion:

So you now have a framework which should help you when you run into SQL performance problems! Now it’s your turn, I would love to hear about what you did when you ran into performance problems in any of your data analysis project.

How do I get experience as an entry-level Data analyst?

Standard

It’s a three-step process:

  1. Figure out where (location) you want to work and who (company) you want to work for.
  2. Note the “skills” required in job Descriptions at companies in your desired location(s) > find common themes from job descriptions > Pick up those skills if you don’t have them already!
  3. Start Applying!
    • Getting a job is a function of Number of Job Applications and your conversion rate (Offers Received/#of Job Applications). Optimizing # of Job Applications is easy — you just need to apply to as many jobs as you could. To improve conversion rate, you would need to do number of things: clear HR/Culture-fit rounds, clear TECH rounds, create a portfolio of projects to talk about, etc.
    • You could also consider applying for internships to get experience. This should help you land full-time roles.

Related Answer: Paras Doshi’s answer to How do I prepare myself to be a data analyst?

VIEW QUESTION ON QUORA

Back to basics: continuous Vs. Discrete variables and their importance in Data Visualization.

Standard

Take a look at the following chart, do you see any issues with it?

month trend chart line chart string to date

Notice that the month values are shown as “distinct” values instead of shown as a “continuous” values and it misleads the person looking at the chart.  Agree? Great! You already know based on your instincts what continuous and discrete values are, it’s just that we will need to label what you already know.

In the example used above, the “Date & Time” shown as a “Sales Date” is a continuous value since you can’t never say the “Exact” time that the event occurred…1/1/2008 22 hours, 15 minutes, 7 seconds, 5 milliseconds…and it goes on…it’s continuous.

But let’s say you wanted to see Number of Units Sold Vs Product Name. now that’s countable, isn’t it? You can say that we sold 150 units of Product X and 250 units of product Y. In this case, Units sold becomes discrete value.

The chart shown above was treating Sales Date as discrete values and hence causing confusion…let’s fix it since now you the difference between continuous and discrete variables:

Statistics Discrete Continuos Variable Data Visualization

Conclusion:

To develop effective data visualizations, it’s important to understand the data types of your data. In this post, you saw the difference between continuous and discrete variables and their importance in data visualization.