What’s a good chart making software that can pull online data?

Standard

So essentially you want to build a *live* chart that pulls data from some online data-store (which changes often).

To do that you can do one of three things:

  1. See if they have an API that you can use — if so, you should be able to use that. If not, continue reading…
  2. Build a web scraper on your own. There are tutorials out there that would help you do so in the language of your choice.
    Chart web scraping data
  3. Use a software service like Import.io | Web Data Platform & Free Web Scraping Tool or Web Scraper — or you could find something else. I have used Import[dot]io and was able to build an API using their service — which i used a data-store for my “charts”

Side note: just make sure you are not violating any terms by scraping the website.

VIEW QUESTION ON QUORA

 

How to create a Histogram in Excel?

Standard

Histogram is a powerful data analysis technique — it let’s you quickly see the distribution of the data you have. So in this post, I am going to list the steps to create histogram in Excel.

It’s a two-step process.

  1. Install “Data Analysis Tool Pak” (free Excel add-in)
  2. Format the data and build the histogram

Step 1: Install Data Analysis Tool Pak.

One of the most useful data analysis add-in in excel is not available by default! It’s called “Analysis ToolPak”

To activate it. Go to File > Excel options > Addins > For the manage field, select Excel add-ins

Histogram Manage Excel add-insMake sure that ToolPak is activated and click OK.

Histogram analysis toolpak excel(Also, Solver is a great add-in as well! It’s not in the scope of this article to discuss that add-in but it’s a powerful add-in as well. For instance, it let’s you work on optimization problems)

Step 2: Format Data and build the Histogram

So now let’s format the data.

You need two things to create a Histogram. 1) Data 2) Range

Here’s an example: (I have about 3000 numbers and need to see the distribution)

You could have other fields on the sheet as well but you need at least the data field. Range is optional but I recommend that you specify the Range so that your histogram would have the bins that you specified — otherwise you could have just used a bar chart!

Note that both of them are numerical.

Data Histogram

Now go to Menu Bar > Data > Data Analysis

Data Analysis HistogramOut of the options available, click on Histogram and select the Input Range and Bin Range > after you’re done, click OK.

Data Analysis Histogram ToolpakYou should see a new worksheet with raw data (ready for charting!). Now, create a Bar chart using the raw data and you have your histogram:

Histogram Excel Data AnalysisConclusion:

In this post I listed the steps you can take to create a Histogram in Excel. Note that there are other options as well — like R (hist function) that let’s you build histogram as well so you do have choice of tools but if you want to stick with excel and it’s good enough then you now know how. Cheers!

Related Post: What is the difference between Histogram & Bar Chart?

What are the reasons why developing a data dictionary is so important?

Standard

data dictionary

Let me first define “Data Dictionary” — It’s a document that lists data fields/metrics and their standardized definition to be used across the org.

The key here is: Standardized.

Imagine this:

Imagine that a management team meeting is going on and you have CEO, VP of Sales, VP of Marketing, CFO, COO among others in the room.

Meeting Agenda: why they didn’t hit the $100M profit goal in the first quarter. So each of them start with the reports they had access to.

VP of Sales says they missed it by $5M

CFO says that they missed it by $9M

COO says that they missed it by $7M

VP of Marketing has three different versions on her report and she is confused!

No ONE talks about the “Why” they missed the goal but instead spends next hour reconciling the numbers!

It was a hypothetical scenario but these things happen all the time! Of course it could be any team meeting and the metric could be something else or it could just that someone is working on something on their own and end up spending a lot of time digging through all the metric definitions and trying to makes sense of it all. This is where data dictionary could help! Let’s take this a step further:

What’s one of the most important characteristic of a good data analysis/science?

It needs to be Actionable.

It needs to help business decision makers take action based on the insights that they found or were shared with them. And before they take that decision, business decision makers need the data they can TRUST!

For data to be trusted, it needs to be understood. It needs to have a definition that everyone agrees upon.

This is what data dictionary is for. It lists data fields/metrics and their standardized definition so that everyone in the org understands what the field/metric means and don’t have to worry about aligning their meaning. They could focus on Analyzing and extracting insights that would change the business and the world!

VIEW QUESTION ON QUORA

What is the title these days for a person that assures data quality?

Standard

Question:

What is the title these days for a person that assures data quality?
(I need to hire a person to make sure my data is as good as it can be. They need to inspect the data for issues, create logic for how it can be found and fixed, and finally, court the project through application development for a robust solution to stop it from occurring in the first place.)

Answer:

Quality of the data shouldnt be a responsibility of just one person — ideally, you want all members of the team (and broader business community) to care and own some part of it. But i like the idea of one person owning the “co-ordination” of how this gets done. It might not be a full time gig in a small org but can see this as a full time role in bigger orgs and enterprises. Some titles:

  1. data co-ordinator
  2. Data quality analyst (or just data analyst)
  3. Data steward
  4. Master data management analyst
  5. Data quality engineer (or just data engineer)
  6. Project manager (data quality)
  7. Manager, data quality and master data management

Read the original question on Quora

Building data driven companies — 3 P’s framework.

Standard
Data Driven Comapnies need Process Platform People

Data Driven Companies — 3 P’s framework

People:

To build data-driven organization, you need decision makers to use data instead of anything else. So you need to help built a culture where data-driven decision-making thrives — usually this is most efficient if you have executive buy-in. Example: A CEO who is a stats-junkie! Of course, not every company would have this. It could be that you find yourself in an organization where the CEO is known to make huge bets just using “gut” — in cases like this, an organization could have some of the best platform and processes but unfortunately, it won’t do any good.

Now just having people who make data driven decisions is not enough — you (as a data professional) need to deliver “data” to them. To do that you need 1) Processes 2) Platform. So let’s talk about them:

Platform:

A platform in this context is the data and analytics platform used by the organization to get the data they need, when they need it. If the organization is small (e.g. less than 15 or so) then the platform could be excel and engineers/analyst writing ad-hoc queries but as you grow (= team size expands) then you need better platform to serve the data needs of the organization. Some tools are better than others and you would usually wind up using multiple vendors in your analytics stack — but remember that jut having a great analytics platform is not enough. You need the “people” and the “processes” to go with that. So, with that let’s talk about process:

Process:

Process is everything between Platform and People. Let me expand on this. Here are few things where having a defined process is key for building data-driven organizations.

  1. How to prioritize the analytics request? It will be great to have a process where you/team will work on projects that closely align with the strategic objective of the company
  2. What does the analytics org-structure look like? Do you have analyst embedded in each team or do you have a centralized team or do you go for a hybrid approach?
  3. What is the process to justify investment in analytics?
  4. Which is the “right” metric definition? (There needs to be a process that keeps the metric definition standardized in an organization)
  5. What is the process to clean data? (Maintaining data integrity is key. You could put this on “Platform” bucket as well)
  6. How do users get “help”? (Is there a ticketing system that they should use? Is it just another “IT” ticket? Who responds to tickets? What’s the SLA around analytics queue tickets? etc)
  7. Who owns “analytics”? There needs to be someone on the team owns analytics like analytics manager, VP of analytics and he/she should be reporting to someone on management team (CIO, CFO, COO, Chief of Staff, CEO) who is held responsible as well.

The list goes on…but I hope you get the point. Having a well-defined processes in an organization is important — usually, this stuff gets less attention and org’s/teams tend to focus just on “platform” which might not be the best thing to do.

Having shared the 3 P’s, let me share few tips on

How to go about implementing the framework:

Three tips:

  1. Identify the “P” that has the best ROI
  2. It’s an iterative process!
  3. Refine as needed

On #1. To help you identify the “P” that has the best ROI, your first step could be to create a matrix to help you evaluate where your organizations falls. I have shown an example below:

Building Data Driven companies 3 Ps framework matrix

If you want to build analytics from scratch then you would love working at early stage startups (bottom-right) but if you like advanced stuff (data-science) then Top-right corner is great! Also, For org’s in Top Left where you have the platform and processes but lack data-driven people — it would be wise to crank up your efforts to drive adoption. (since you already have the right platform and process than any additional investment here would yield little to no ROI).

On #2. Understand that it’s an iterative process. You are never done optimizing any of these P’s! It’s a journey and not a destination.

Continuos Improvement Process People Platform

On #3: Just like with other frameworks, you’ll need to refine and adjust this based on your needs. You may have noticed that I focused on “Org-wide” framework but you could be heading up an analytics function for a department and in that case, not all of the things here would help. “People”, “Process” and “Platform” would still apply on a high level but it might just be that you don’t have “control” over the platform. So, you may need to refine/adjust this as needed.

I hope the framework is a great tool for you to think about building data driven companies!

Best,
Paras Doshi

PS: If you like articles like this, don’t forget to sign up for the newsletter!

[VIDEO] Microsoft’s vision for “Advanced analytics” (presented at #sqlpass summit 2015)

Standard

Presented at #sqlpass summit 2015.

SQL Server Reporting services: How to display “There are NO rows” message?

Standard

Problem:

You have a SQL Server reporting services (SSRS) report that has a table which displays some records — but sometimes it can have NO rows; In that case, how to display “There are No rows” message so that it doesn’t confuse the consumer.

Solution:

  1. Open the report in SQL Server Data Tools and go to the “design” tab of your SSRS report
  2. Select your table (do NOT select a cell inside a table. Make sure that the table is selected) SQL Server reporting services NO data rows message
  3. While the “table” is selected, Go the Properties section OR you can use F4
  4. Inside the Properties section, find “No Rows” section and you should see a NoRowsMessage property:SQL Server reporting services NO data rows message v2
  5. Go to the preview tab to make sure it’s working and you should be ready to deploy the change!

That’s it! Hope that helps.

Official reference:  https://msdn.microsoft.com/en-us/library/dd220407.aspx

Author: Paras Doshi