Book Giveaway: Head First Data Analysis — Ends 07/22/2016

Standard

<< THIS GIVEAWAY IS CLOSED NOW! Thanks for Participating! >>

Head First Data Analysis

Book Giveaway: Head First Data Analysis — A learner’s guide to big numbers, statistics and good decisions!

I love Head First series — if you haven’t read one of these books, you should — it’s great! So when I learned that they had a Data Analysis one, I had to read it. So I bought one and skimmed through it.

Now, Instead of letting it sit on my shelf, I think it might better serve its life purpose if more people read it so I have decided to do this little experiment.

Rules:

  1. You need to have an US-based address so that I can ship it to you (no cost to you!)
  2. You need to comment on this blog post on or before 07/22/2016 — just put your name & email. I’ll contact you if you win*

*Random selection!

Go!

How to create a Histogram in Excel?

Standard

Histogram is a powerful data analysis technique — it let’s you quickly see the distribution of the data you have. So in this post, I am going to list the steps to create histogram in Excel.

It’s a two-step process.

  1. Install “Data Analysis Tool Pak” (free Excel add-in)
  2. Format the data and build the histogram

Step 1: Install Data Analysis Tool Pak.

One of the most useful data analysis add-in in excel is not available by default! It’s called “Analysis ToolPak”

To activate it. Go to File > Excel options > Addins > For the manage field, select Excel add-ins

Histogram Manage Excel add-insMake sure that ToolPak is activated and click OK.

Histogram analysis toolpak excel(Also, Solver is a great add-in as well! It’s not in the scope of this article to discuss that add-in but it’s a powerful add-in as well. For instance, it let’s you work on optimization problems)

Step 2: Format Data and build the Histogram

So now let’s format the data.

You need two things to create a Histogram. 1) Data 2) Range

Here’s an example: (I have about 3000 numbers and need to see the distribution)

You could have other fields on the sheet as well but you need at least the data field. Range is optional but I recommend that you specify the Range so that your histogram would have the bins that you specified — otherwise you could have just used a bar chart!

Note that both of them are numerical.

Data Histogram

Now go to Menu Bar > Data > Data Analysis

Data Analysis HistogramOut of the options available, click on Histogram and select the Input Range and Bin Range > after you’re done, click OK.

Data Analysis Histogram ToolpakYou should see a new worksheet with raw data (ready for charting!). Now, create a Bar chart using the raw data and you have your histogram:

Histogram Excel Data AnalysisConclusion:

In this post I listed the steps you can take to create a Histogram in Excel. Note that there are other options as well — like R (hist function) that let’s you build histogram as well so you do have choice of tools but if you want to stick with excel and it’s good enough then you now know how. Cheers!

Related Post: What is the difference between Histogram & Bar Chart?

What are the reasons why developing a data dictionary is so important?

Standard

data dictionary

Let me first define “Data Dictionary” — It’s a document that lists data fields/metrics and their standardized definition to be used across the org.

The key here is: Standardized.

Imagine this:

Imagine that a management team meeting is going on and you have CEO, VP of Sales, VP of Marketing, CFO, COO among others in the room.

Meeting Agenda: why they didn’t hit the $100M profit goal in the first quarter. So each of them start with the reports they had access to.

VP of Sales says they missed it by $5M

CFO says that they missed it by $9M

COO says that they missed it by $7M

VP of Marketing has three different versions on her report and she is confused!

No ONE talks about the “Why” they missed the goal but instead spends next hour reconciling the numbers!

It was a hypothetical scenario but these things happen all the time! Of course it could be any team meeting and the metric could be something else or it could just that someone is working on something on their own and end up spending a lot of time digging through all the metric definitions and trying to makes sense of it all. This is where data dictionary could help! Let’s take this a step further:

What’s one of the most important characteristic of a good data analysis/science?

It needs to be Actionable.

It needs to help business decision makers take action based on the insights that they found or were shared with them. And before they take that decision, business decision makers need the data they can TRUST!

For data to be trusted, it needs to be understood. It needs to have a definition that everyone agrees upon.

This is what data dictionary is for. It lists data fields/metrics and their standardized definition so that everyone in the org understands what the field/metric means and don’t have to worry about aligning their meaning. They could focus on Analyzing and extracting insights that would change the business and the world!

VIEW QUESTION ON QUORA

How do I get experience as an entry-level Data analyst?

Standard

It’s a three-step process:

  1. Figure out where (location) you want to work and who (company) you want to work for.
  2. Note the “skills” required in job Descriptions at companies in your desired location(s) > find common themes from job descriptions > Pick up those skills if you don’t have them already!
  3. Start Applying!
    • Getting a job is a function of Number of Job Applications and your conversion rate (Offers Received/#of Job Applications). Optimizing # of Job Applications is easy — you just need to apply to as many jobs as you could. To improve conversion rate, you would need to do number of things: clear HR/Culture-fit rounds, clear TECH rounds, create a portfolio of projects to talk about, etc.
    • You could also consider applying for internships to get experience. This should help you land full-time roles.

Related Answer: Paras Doshi’s answer to How do I prepare myself to be a data analyst?

VIEW QUESTION ON QUORA

What data are data scientists at startups actually analyzing? How is it collected?

Standard

Question: What data are data scientists at startups actually analyzing? How is it collected?
(Coming from a web analytics background I’m wondering what data are data scientist at IT companies actually analyzing. Is it server-side or client-side? Is it collected internally or using some external tool?)

Answer:

Part 1: What are startups analyzing?

It depends on the Business Model and the Stage that they are at.

Business Models: Marketplace, Ecom, SaaS, Media, etc.

Stage: Early, Mid, Late

So let’s say you have a SaaS model and you’re in Mid-stage (post product-market fit stage) then you would tend to be focused on things like: Engagement, Churn, etc…and ideally they should be focused on measuring what aligns best with the strategy (instead of capturing everything!)

Let’s take another example. Let’s say you are a Marketplace in late-stage. So you would tend to be focused more on the “money” and so you can measure things like: transactions, commissions, etc…

I recommend reading “lean analytics” book as it goes much deeper and it’s a great starting point for anyone to understand how analytics could help a startup.

Part 2: How is it collected?

Now this also depends on your product. Assuming you’re a tech startup, you would have Web App and/or Desktop app and/or Mobile app. And now depending on your delivery approach plus your measurement needs, the “how” part will be determined. It would invariably be a combination of your transactions data source, web/mobile events stack (like Google analytics/other-Vendor or Custom), finance data source among others.

This post points to 10 other blogs which lists their “data” stack: The Data Infrastructure Meta-Analysis: How Top Engineering Organizations Built Their Big Data Stacks – The Data Point

View Question on Quora

What is the difference between Histogram & Bar Chart?

Standard
Histogram Bar Chart
 Histogram Bar Chart
The x-axis represents bins. So if you have a continuous variable like age which has values from 0-100 then you can create bins like 0-10, 10-20 and so on (and here bin size = 10). You can change the bin size to analyze the distribution of the data.
X-axis has a numerical (quantitative) variable.
The x-axis represents distinct categories from your data.
The variable on the x-axis is usually qualitative
The order of the bins is important since it is used to understand the distribution of the data. The order of the categories in the bar chart doesn’t matter. We can sort it if we want but it’s not needed.

[Video] AI, Deep Learning and Machine Learning

Video

I watched this video over the weekend and wanted to share this very well done presentation by a Venture Capital (VC) firm with you — that’s why I love following VC’s (especially one’s who invest in Data/Analytics theme) since they tend to share some amazing insights on where the industry is going.

Abstract:
“One person, in a literal garage, building a self-driving car.” That happened in 2015. Now to put that fact in context, compare this to 2004, when DARPA sponsored the very first driverless car Grand Challenge. Of the 20 entries they received then, the winning entry went 7.2 miles; in 2007, in the Urban Challenge, the winning entries went 60 miles under city-like constraints.

Things are clearly progressing rapidly when it comes to machine intelligence. But how did we get here, after not one but multiple “A.I. winters”? What’s the breakthrough? And why is Silicon Valley buzzing about artificial intelligence again?

From types of machine intelligence to a tour of algorithms, a16z Deal and Research team head Frank Chen walks us through the basics (and beyond) of AI and deep learning in this slide presentation.

URL: http://a16z.com/2016/06/10/ai-deep-learning-machines/