As a student preparing for data anaylst & science roles, should I generalize vs specialize?


This question was posted on Springboard forum.

Here’s my answer:

It depends on your target industry & where they are in their life-cycle.

It has four stages: Startup, Growth, Maturity, Decline.

Industry lifecycle

Generalization is great in earlier stages. If you are targeting jobs at startups; generalize. You should know enough about lot of things.

T-shaped professionals are great for Growth stage. They specialize in something but still know enough about lot of things. E.g. Sr Growth/Marketing Analyst. Know enough about analytics & data science to be dangerous but specializes in marketing.

Specialization is great for mature industries. They know a lot about few things. E.g. Statisticians in an Insurance industry. They have made careers out of building risk models.

In how many dimensions (Vs) is Big Data commonly defined?


Asked on Quora:

When reading about Big Data, this starts with the definition of Gartner’s analyst Doug Laney (3Vs). IBM is often using 4 dimensions by adding veracity. Some people are using 6 or up to 12 dimensions. I am wondering what’s the most frequently used definition?


Here’s my “working” definition of Big Data: if your existing 1) Tools & 2) Processes don’t support the data analysis needs then you have a Big Data problem.

You can add as many V’s as you want to but it all ties back to the notion that you need bigger and better tools and processes to support your data analysis needs as you grow.


#1. Social Media Data is BIG! It’s Text (variety) and much bigger in size (Volume) and it’s all coming in very fast! (velocity) AND business wants to analyze customer sentiments on social: OK — we have 3V’s problem and need a solution to support this. Maybe Hadoop is the answer. Maybe not. But you do have a “Big Data” problem.

#2: Your Customer Database is broken. They don’t right addresses. Google and Alphabet are showing up as two separate companies when they should be just one. Their employee count is outdated and All of these problems is confusing your business user and they don’t TRUST the data anymore. You have a veracity problem and so you have a BIG Data problem.

Everyone has a BIG DATA problem. It just depends what there “v’s” are AND it most cases “tools” alone will not solve the issue. You need PEOPLE and PROCESS to solve that. Here’s my ranking: 1) PEOPLE 2) PROCESS 3) PLATFORM (tools) for ingredients that are key to solving BIG Data problems.


How do I learn #SQL for #data analysis?


Step 1:

This is a good starting point: SQL School Table of Contents

OR, this: Learn SQL

Both of these resources were put together by analytics vendor and is targeted towards beginners.

Step 2:

Review this Quora Thread: How do I learn SQL?

Participate in competitions like this: Solve SQL Code Challenges

Step 3:

If you like to go more in-depth then check out few books:

  1. Head First SQL
  2. Learn SQL the hard Way
  3. Certification books/material from a database vendor

Hope that helps!


Data analytics vs. Data science vs. Business intelligence: what are the key differences/distinctions?


They are used interchangeably since all of them involve working with data to find actionable insights. But I like to differentiate them based on the type of the question you’re asking:

  • What:

What are my sales number for this quarter?

What is the profit for this year to date?

What are my sales number over the past 6 months?

What did the sales look like same quarter last year?

All of these questions are used to report on facts and tools that help you build data models and reports can be classified as “Business Intelligence” tools.

  • Why:

Why is my sales number higher for this quarter compared to last quarter?

Why are we seeing increase in sales over the past 6 months?

Why are we seeing decrease in profit over the past 6 months?

Why does the profit this quarter less compared to same quarter last year?

All of these questions try to figure why something happened? A data analyst typically takes a stab at this. He might use existing Business Intelligence platform to pull data and/or also merge other data sets. He/she then applies data analysis techniques on the data to answer the “why” question and help business user get to the actionable insight.

  • What’s next:

What will be my sales forecast for next year?

What will be our profit next year for Scenario A, B & C?

Which customers will cancel/churn next quarter?

Which new customers will convert to a high-value customer?

All of these questions try to “predict” what will happen next (based on historical data/patterns). Sometimes, you don’t know the questions in the first place so there’s a lot of pro-active thinking going on and usually a “data scientist” are doing that. Sometimes you start with a high level business problem and form “hypothesis” to drive your analysis. All of these can be classified under “data science”.

Now, as you can see as we progressed from What -> Why -> What’s next, the level of sophistication needed to do the analysis also increased. So you need a combination of people, process and technology platform in an organization to go from having a Business Intelligence maturity all the way to achieving data science capabilities.

Here’s a related blog post that I wrote on this a while back: Business Analytics Continuum: – Insight Extractor – Blog

Data Science

..And you can check out other stuff I write about here: Insight Extractor – Blog – Paras Doshi’s Blog on Analytics, Data Science & Business Intelligence.


Looker vs Tableau: How would you compare them in terms of price & capabilities?


Someone asked this on quora so here’s my response: This is a great question — one that I figured it out when I led Analytics at Kiva[.]Org last year so I am happy to add my perspective here on Looker vs Tableau:

Looker vs Tableau

Let’s talk about capabilities first and then price.

Capabilities — Looker vs Tableau

Even though both of these tools are classified under Business Intelligence, they have some pretty clear product differentiation so in this section, I will share that. I will share the three main components of Business Intelligence platforms and then map it back to core strengths of each product.

Business Intelligence platforms typically has three main components:

  1. Data Collection, Storage & Access
  2. Data Modeling
  3. Data Visualization

#1) Data Collection, Storage & Access: Both of these tools don’t do data collection & storage. You will need infrastructure to collect data and store it — typically it is stored in databases. And you can access data from databases using SQL. You will need to connect to these data sources from either of these tools and access data — Note that: On the surface, it might look like Tableau supports more data sources than Looker but there might be workaround to get your data into one of the data sources supported by Looker and take it from there and so I am not awarding extra points to Tableau for this. Also, I am personally a big proponent of using Analytic databases like redshift, vertica, bigquery & Azure DW for Analytics applications which Looker & Tableau both support so calling it a tie here!

#2) Data Modeling: This is Looker’s core strength by a wide margin! Why? This is because of their LookML which is their data modeling layer and I am super impressed by this after using it for a while now! So let’s chat about what data modeling layer means and why you should care.

Data modeling (in this context) means creating data models that take your raw data as input and then it’s cleaned, combined, curated & converted and made ready for data analysis.

Why is this important? Not everyone can clean, curate, combine & convert raw data into analysis-friendly data assets. That’s what data analysts are trained and specialize in. May in the future we will have tools that do that OR maybe we will see plug-and-play (aka turnkey) solutions for few key analysis needs but for now, you need data analysts that can create these data models.

Now there are two ways to create data models:

You can create them on-the-fly (ad-hoc) OR you can publish all of these data models on a platform (like Looker).

There are all sorts of issues with doing it on-the-fly — it works for small teams (<20–30 people) but more than that you need to have some process in place. For instance: You can’t automate data models that you need often so that’s wasted time, Also, you can’t share these models easily with others, creates a single point of failure and if the analyst person is sick or on vacation then no-one gets “insights” from data — the world stops spinning. Yada Yada Yada…So self-service is good after you have few business users who want to consume data.

So what does a self-service platform bring to the table? They help data analyst build these wonderful data-analysis friendly models and publish them so everyone who cares in an org can access it. So the consumer can focus on analysis part and not worry about doing the not-so-good part of making it ready for analysis. Also, this ensures all sorts of other benefits: standardized metric definitions, trusted data sources, better collaboration among analysts, speedier model-delivery process, get out of excel hell and what not!

Think of this way: If you have all key data model available on your self-service platform then your data analyst can focus on 1) advance stuff = more $$$ 2) building more data models (and so eventually they can do more advanced stuff later and more $$$!)

Looker vs Tableau

This is where Looker fits in. Looker is great at this data modeling thing — it’s platform is amazing for anyone looking to solve this problem. You can also do data visualization on top and build dashboards.

Alright, moving on:

#3) Data Visualization: This is Tableau’s forte! No one does data viz better than Tableau, at least right now. There are vendors that are investing significant resources on this and they are close but still Tableau is a leader in this space.

Having said that, let’s map it back to how it help business users & analysts:

Business users and self-service environments:

Tableau is not great at data modeling thing. Yes, you can do basic clean, combine, curate & convert thingy but it doesn’t work well with intermediate to advanced needs. So if you have a self-service data modeling layer already in place that Tableau can connect to and you are looking for a data visualization layer then go for Tableau! You would be able to create some amazing visuals, dashboards and stories that will WOW your business users! But to make sure this scales you need to seriously think about 1) how to overcome the limitations in tableau’s data modeling layer OR 2) use some other tool to build this data modeling layer and connect Tableau to it.

Pro Tip: I highly recommend trying out trials of these products and seeing what works best!


Tableau shines at data discovery! While this certainly helps business users, it’s best leveraged by analyst because whenever they are working on ad-hoc data analysis (one-time, strategic in nature) projects they can be much more effective and discover the underlying trends and patterns in their data by visualizing it using Tableau.

So with that context you might be wondering, What tool did I champion & Implement at Kiva?

This is public knowledge that Kiva is a Looker customer because it’s Logo is on their website so I can share this.

After evaluating about 30+ tools (including Tableau), I ended up championing and eventually leading the initial implementation sprints to implement Looker at Kiva because the goals & vision that we had for Kiva’s data & analytics platform aligned better with having the data modeling layer that met Kiva’s needs. So you need to figure out your goals and vision and then choose the tools with that framework.

Pro Tip #1: It’s insanely hard to figure out what your goals and vision for analytics in an org. To figure this out, you might want to chat with organizations in the same industry at the same size & stage and see what they use. Ask them about what they use and whether it worked for them. Ask them about their Return on investment. This is a great way to get external feedback but you still need to figure out internal needs and prioritize them.

Pro Tip #2: Both of these tools have amazing reviews! You will see them highly ranked in analyst reports too — this is great but it’s important that ever before to clearly define what your organization needs and then map it back to the core strengths of these products (or any other tool for that matter) and go from there!

[I am happy to help evaluate the right tool for you needs, feel free to contact me: Let’s Connect! – Insight Extractor – Blog ]

Pricing — Looker vs Tableau

I can’t talk about Looker’s pricing because it’s not public, I apologize! You need to contact them to get the quote.

However, you can anchor that with Tableau’s pricing which is public: Buy Tableau | Tableau Webstore

Your analyst and power users will need Tableau Desktop/Professional which is $1K and $2K respectively (one-time thing) and then depending on your deployment model: cloud or self-hosted — the price varies:

Looker Tableau Pricing

*Note that Tableau online is a subscription model so you can definitely start small. Let’s say 5 business users in a department and take it from there. If you grow then you can later look at other tools like Looker. (If you are rapidly growing, account for the non-trivial time needed to migrate from one platform to another and so it might be worth it to pick the right tool from the get-go)

Pro Tip: I will encourage you to think about building a ROI model too. You know use some analytics for your analytics projects 😉 — I apologize, couldn’t resist! Anyhow, the point is that instead of just thinking about the “cost”, think about the value-add and anchor your investment figure to that. There’s a reason some analytics tool are priced at let’s say $1000 vs some tools priced at $100,000 — both of them have different value proposition and if you know how to extract value of the tool and can project it then you can get better ROI!

Hope that helps! If I can be of any further help, email me or comment here! Let’s Connect! – Insight Extractor – Blog


Introduction to Goal Seek & Solver capabilities in Excel:


What-if Analysis is a pretty common analysis done by decision makers. Often, they would just create simple excel tables and adjust their variables manually until they get an answer that works. But instead of doing it manually there are features available in excel that will make your life much easier and analysis much more accurate. So, the goal of this blog post is to introduce you to the Goal Seek and Solver feature to help you do what-if analysis in Excel.

#1. Goal Seek:

Let’s say you are a CEO of an e-commerce startup and wondering about what factors you need to focus on to increase revenue. Here’s what the data (*assume per month) looks like when you start out:

excel-goal-seek-1And you want to increase the Revenue to $150K from $125K. The three levers you can pull are website visitors, conversion and revenue per customer.

Now you could manually tweak the values for this variables till you get to $150K but as I promised earlier, there’s a better way!

Let’s start with Goal Seek.

You need to set two variables for Goal Seek.

a. Your goal — which in this case is 150K

b. The variable that needs to be changed to achieve that goal — note that you can specify just one variable to do so. So you need to choose out of the three above what you would like to focus on. Let’s say you want to focus on conversion rate.

So once you have these two things — from the Data Tab in Excel, Go To What-if Analysis, Goal Seek:

excel-goal-seek-2Now, specify the values. For this example, we want to figure out what should be the new conversion rate so that our revenue will be $150K. So here’s an example of how that would look on Goal-seek:

excel-goal-seek-3After entering the values, you will see the status — you can click “OK” to keep the solution and cancel to go back to what you had:

excel-goal-seek-4Perfect! So you need to increase the conversion rate from 1.25% to 1.5% to get to the goal that you had set!

#2: Solver add-in

So, you worked on improving the conversion rate for next month or two and you & your team found out that it’s getting really hard to increase it above 1.35% — And also you found that with the less effort you can move the needle on other variables (website visitors & revenue per customer). Now Goal Seek allows you just set one variable so if you more variables than it doesn’t serve the purpose that well! That is where Solver add-in helps.

Think of Solver as advanced Goal seek where you can set more than one cell that can change. You can also set constraints on what the values could be for all the variables that can change.

Now, for our scenario, the conversion rate is at 1.35% but you want to see the possible changes that you can make for website visitors and revenue per customer to reach $150K.

You also know that you can’t above 1,100,000 Website visitors per month and also need to have less than $11 as revenue per customer.

You will need to enable the Solver add-in in Excel and once you do that you will see that in the Data Tab.

Once you have it, open it and fill up the information needed in the dialog box:

a,. Set objective to Total Revenue with value of 150000

b. By changing cells to: Website Visitors and Revenue per Customer

c. Constraints. Website Visitors <= 1,100,000 and Revenue Per Customer < $11

solver-excel-1After that click on Solve.

if it found a solution, it would show you that on Excel and also give you additional options to whether you want to keep the solver solution or restore it to original values:

For our scenario, it suggesting that with website visitors to 1,010,101 and revenue per customer to $11, we should hit our goal.

solver-excel-2Click on OK when you’re done.


In this post, we saw how you can use Goal Seek and Solver add-in using an e-commerce scenario but you these techniques can be applied to wide variety of data analysis problems that can be solved using “what-if” techniques.

Hope this was helpful and I would love to hear from you about how will you use this in your work? Or if you use it already then what do you use it for?

Can I be a data analyst at a tech company without a degree in computer science?


Yes — it’s not a must have to work as a Data Analyst. In fact, a lot of people come from a non-CS background and succeed in this role!

Let’s look at the pros and cons of having a computer science (CS) degree and this should help you evaluate where you fall:

Data Analyst computer science degree

Pros of having a CS-degree:

  • If the data analyst position requires you to have this degree in CS then you qualify! Fortunately this is not that common and usually it says bachelor’s required in cs, business administration or related field so as long as you have bachelors for positions that require it then you should be fine
  • you might already have the basic tech skills that are needed for data analysis jobs and the CS degree might be used to validate that.
  • you can pick up new tech concepts and tools fast(er) — with the cs background, it’s easier to pick up new concepts & tools — and you need to continuously do that to stay relevant.

Cons of having a CS-degree:

  • Not enough business problem solving experience and/or lack depth in business knowledge — so if you have a degree in business then you come ahead! Especially if your background aligns with the role. For example: if you focused on Marketing in your bachelors and the role is focused around marketing analytics then you might have an edge
  • I have a CS degree and then I followed it up with a masters from a “business school” — so this is just based on my experience but few CS students (without real world experience) are inclined to focus on “automation” and “bleeding-edge” instead of focusing on what the problem needs. Lot of data analysis doesn’t need to be automated or shouldn’t be automated and not every company needs <<insert the latest tech trend here: big data, deep learning>> — but CS students tend to do that. That’s what they feel most comfortable with so while that doesn’t stop from getting the job, this would impede their growth as a data analyst within the org.


So as you can see even if you don’t have a CS degree, you can still find roles that align with your other skills and in fact, you might be able to come out ahead if you can prove that you have basic quantitative and tech skills needed to get the job done.

Related: Paras Doshi’s answer to How do I prepare myself for a career in Data Analysis?