I have two resources that I use sometimes to pick the right graph or chart for data visualization.
#1: Chart Suggestions:
#2: Online Tool
(By Juice Labs)
I have two resources that I use sometimes to pick the right graph or chart for data visualization.
(By Juice Labs)
Someone asked this on Quora about how to learn & explore the field of Big Data Algorithms? Also, mentioned having some background in python already and wanted ideas to work on a good project so with that context, here is my reply:
There are two broad roles available in Data/Big-Data world:
*smaller companies (or startups) tend to have roles where small teams(or just one person) do it all so the distinction is not that apparent.
Now given your background in python and programming, you might be a great fit for “Data engineer” roles and I would recommend learning about Apache spark (since you can use python code) and start building data pipelines. As you work with a little bit more than you can learn about how to build and deploy end-to-end machine learning projects with python & Apache spark. If you acquire these skills and keep learning — then I am sure you will end up with a good project.
Hope that helped and good luck!
There are few places that I can think of where data scientist hang out online:
Someone asked on Quora: What analytics data gives you the most actionable advice to improve your blog? so here’s my answer:
I have been blogging about Analytics for past few years and this question is at the intersection of both so let me give it a shot:
It depends on two things: 1) Your goal for running the blog 2) Age of the blog
#1: Your goal
First let’s talk about your goal for running the blog. It’s important to define this as this would help set the metrics that you will monitor and take actions to improve it.
Let’s say that the goal of your blog is to earn is to monetize using ads. So your key performance indicator (KPI) will be monthly ad revenue. In that case you can improve by one of the three things: Number of People visiting the blog x % of visitors clicking on ads x average revenue per ad click. You can work on marketing your blog to increase number of people visiting the blog. Then you can work on ad placement on your blog to increase % of visitors clicking the ad and then you can work on trying different ad networks to see which one pays you the most per click.
let’s take one more example. Like me if your goal is to use your blog for “exposure” which helps me build credibility in the field that I work in. In this case, the KPI i look at is Monthly New Visitors. I drill down further to see which marketing channels are driving that change. That helps me identify channels that I can double down on and reduce investments in other areas. For example: I found that Social is not performing that great but Search has been working great — I started investing time in following SEO principles and spent less time on posting on social.
So first step: Define your goal and your KPI needs to align with that.
#2: Age of your blog:
TL;DR: Define your “why” and then pick a metric— then use combination of qualitative and quantitative data to improve the underlying driving factors to improve the metric.
SQL is a common language used by data analysts (and even business users!) for data analysis — one of the reasons is popular is because it’s not that hard to pick it up. Sure, there is some learning curve especially if you don’t have a computer programming background but once you learn some basic commands, you will be able to apply it and answer a lot of questions. So it does give you lot of power! But sometimes you run into issues where your SQL queries are taking forever to complete and you wonder why that’s the case. In this post, I am going to introduce you to performance tuning that will help to troubleshoot next time you run into performance problems.
your queries are slow due to one of the three reasons listed below:
#1: SQL Query optimization
#2: Database software, environment & optimization
#3. Hardware
You should start at Level 1 which is query optimization and then work your way down to other levels. This post will focus on SQL Query optimization as that is something you can control and it is also the most common root cause. Let’s focus on this first and then we will explore other options.
Depending on your skill level, you can look at a lot of things. But for the purpose of this blog post lets say you have beginner – intermediate SQL Knowledge and with that, you can look at following things:
SELECT cat.product_category,
sub.*
FROM (
SELECT p.product_name,
COUNT(*) AS products
FROM Product p
GROUP BY 1
) sub
JOIN ProductCategory cat
ON cat.product_name = sub.product_name
Let’s say you have tried everything you could to tune your SQL queries then it’s move to explore other options:
#2: Database software, environment & optimization: This is usually owned by Dev Ops or IT team and you will have to work with them. Depending on your team size, there might be a DBA, System Admin or DevOps engineer responsible for this. Here’s few things you should check out along with the IT team:
#3: Hardware:
Since database is a software, it is constrained by the resources that it is allocated at hardware level just like any other software. You should dive deeper into this if #1 & #2 don’t work out — This is not the most common root-cause but as a rule of thumb, you should be scaling your hardware resources as other systems are scaled too. if that’s not done regularly then you will hit hardware issues. Also, don’t just upgrade your hardware, as I referred to earlier in #2, consider looking into databases that are better for analytics like vertica, redshift, bigquery etc. Compare all options & do an ROI analysis as upgrading hardware is usually a “duct-tape” solution and you will run into it again if you continue to grow.
So you now have a framework which should help you when you run into SQL performance problems! Now it’s your turn, I would love to hear about what you did when you ran into performance problems in any of your data analysis project.