How does rise of Power BI & Tableau affect SSRS?

Standard

It does affect SSRS adoption but SSRS (sql server reporting service) still has a place as long as there’s need for printer-friendly reporting and self-service vendors don’t have a good solution to meet this need.

Also, SSRS is great for automating operational reports that sends out emails with raw data (list of customers, products, sales transaction etc).

I advocate an analytics strategy where we think about satisfying data needs using “self-service”-first (Power BI, tableau, qlik) but if thats not the optimal solution (for cases like need to print it, I just need you to send me raw data in excel, etc) then I’ll mark it as SSRS project. And this architecture is supported by a central data model (aka operational data store, data mart, data warehouse) which makes it much easier to swap in/out any reporting tools that we need and we are not locked in by one vendor.

About 10–20% data requests that I see are SSRS projects and if the self-service platforms start adding features that compete with SSRS, I know I would start using those capabilities and phase out SSRS. But if that doesn’t happen, I will continue using SSRS 🙂

VIEW QUESTION ON QUORA


Let me know what you think in the comments section!

Paras Doshi

This post is sponsored by MockInterview.co, If you are looking for data science jobs, check out 75+ data science interview questions!

5 tests to validate the quality of your data:

Standard

Missing Data:

  • Descriptive statistics could be used to find missing data
  • Tools  like SQL/Excel/R can also be used to look for missing data
  • Some of the attributes of a field are missing: Like Postal Code in an address field

Non-standardized:

  • Check if all the values are standardized: Google, Google Inc & Alphabet might need to be standardized and categorized as Alphabet
  • Different Date formats used in the same field (MM/DD/YYYY and DD/MM/YYYY)

Incomplete:

  • Total size of data (# of rows/columns): Sometimes you may not have all the rows that you were expecting (for e.g. 100k rows for each of your 100k customers) and if that’s not the case then that tells us that we don’t complete dataset at hand

Erroneous:

  • Outlier: If someone;s age is 250 then that’s an outlier but also it’s an error somewhere in the data pipeline that needs to be fixed; outliers can be detected using creating quick data visualization
  • Data Type mismatch: If a text field is in a field where other entries are integer that’s also an error

Duplicates:

  • Duplicates can be introduced in the data e.g. same rows duplicated in the dataset so that needs to be de-duplicated

Hope that helps!

Paras Doshi

This post is sponsored by MockInterview.co, If you are looking for data science jobs, check out 75+ data science interview questions!

Journal of statistical software paper on tidying data:

Standard

Data cleaning takes up a lot of time during a data science process; it’s not necessarily a bad thing and time spent on cleaning data is worthwhile in most cases; To that end, I was researching some framework that might help me make this process a little bit faster. As a part of my research, I found the Journal of statistical software paper written by Hadley Wickham which had a really good framework to “tidy” data — which is part of data cleaning process.

Author does a great job of defining tidy data:

1. Each variable forms a column.
2. Each observation forms a row.
3. Each type of observational unit forms a table.

And then applying it to 5 examples:

 1. Column headers are values, not variable names.
2. Multiple variables are stored in one column.
3. Variables are stored in both rows and columns.
4. Multiple types of observational units are stored in the same table.
5. A single observational unit is stored in multiple tables

It also contains some sample R code; You can read the paper here: http://vita.had.co.nz/papers/tidy-data.pdf

As a student preparing for data anaylst & science roles, should I generalize vs specialize?

Standard

This question was posted on Springboard forum.

Here’s my answer:

It depends on your target industry & where they are in their life-cycle.

It has four stages: Startup, Growth, Maturity, Decline.

Industry lifecycle

Generalization is great in earlier stages. If you are targeting jobs at startups; generalize. You should know enough about lot of things.

T-shaped professionals are great for Growth stage. They specialize in something but still know enough about lot of things. E.g. Sr Growth/Marketing Analyst. Know enough about analytics & data science to be dangerous but specializes in marketing.

Specialization is great for mature industries. They know a lot about few things. E.g. Statisticians in an Insurance industry. They have made careers out of building risk models.

Any advice for moving into data science from business intelligence?

Standard

This was asked on Reddit: Any advice for moving into data science from business intelligence?

Here’s my answer:

I come from “Business Intelligence” background and currently work as Sr. Data Scientist. I found that you need two things to transition into data science:

Data Culture: A company where the data culture is such that managers/executives ask big questions that need a data science approach to solve it. If your end-consumers are still asking bunch of “what” questions then your company might NOT be ready for data science. But if your CEO comes to you and says “hey, I got the customer list with the info I asked for but can you help me understand which of these customers might churn next quarter?” — then you have a data science problem at hand. So, try to find companies that have this culture.

Skills: And you need to upgrade your skills to be able to solve data science problems. BI is focused too much on technology and automation and so may need to unlearn few things. For example: Automation is not always important since you might work on problems where a model is needed to predict just a couple of times. Trying to automate wouldn’t be optimal in that case. Also, BI relies heavily on tools but in Data science, you’ll need deeper domain knowledge & problem-solving approach along with technical skills.

Also, I personally moved from BI (as a consultant) -> Analytics (as Analytics Manager) -> Data science (Sr Data Scientist) and this has been super helpful for me. I recommend to transition into Analytics first and then eventually breaking into data science.

Hope that helps!

VIEW THREAD ON REDDIT

In how many dimensions (Vs) is Big Data commonly defined?

Standard

Asked on Quora:

When reading about Big Data, this starts with the definition of Gartner’s analyst Doug Laney (3Vs). IBM is often using 4 dimensions by adding veracity. Some people are using 6 or up to 12 dimensions. I am wondering what’s the most frequently used definition?

Answer:

Here’s my “working” definition of Big Data: if your existing 1) Tools & 2) Processes don’t support the data analysis needs then you have a Big Data problem.

You can add as many V’s as you want to but it all ties back to the notion that you need bigger and better tools and processes to support your data analysis needs as you grow.

Example:

#1. Social Media Data is BIG! It’s Text (variety) and much bigger in size (Volume) and it’s all coming in very fast! (velocity) AND business wants to analyze customer sentiments on social: OK — we have 3V’s problem and need a solution to support this. Maybe Hadoop is the answer. Maybe not. But you do have a “Big Data” problem.

#2: Your Customer Database is broken. They don’t right addresses. Google and Alphabet are showing up as two separate companies when they should be just one. Their employee count is outdated and All of these problems is confusing your business user and they don’t TRUST the data anymore. You have a veracity problem and so you have a BIG Data problem.

Everyone has a BIG DATA problem. It just depends what there “v’s” are AND it most cases “tools” alone will not solve the issue. You need PEOPLE and PROCESS to solve that. Here’s my ranking: 1) PEOPLE 2) PROCESS 3) PLATFORM (tools) for ingredients that are key to solving BIG Data problems.

VIEW QUESTION ON QUORA

How do I learn #SQL for #data analysis?

Standard

Step 1:

This is a good starting point: SQL School Table of Contents

OR, this: Learn SQL

Both of these resources were put together by analytics vendor and is targeted towards beginners.

Step 2:

Review this Quora Thread: How do I learn SQL?

Participate in competitions like this: Solve SQL Code Challenges

Step 3:

If you like to go more in-depth then check out few books:

  1. Head First SQL
  2. Learn SQL the hard Way
  3. Certification books/material from a database vendor

Hope that helps!

VIEW QUESTION ON QUORA