Exploring, filtering and shaping web-based public data using Data Explorer Excel add-in:


Data Explorer let’s you “Explore” (search) for web-based public data. This is a great way to combine data that you may have in your data-sources with public data sources for data analysis purposes. Sometimes your data might not tell you the reason behind the observed trends, so when that happens – you can try to see if a public data-set might give you the much-needed context. Let me give you an Example before we start hands-on w/ data explorer so that you have better understanding of importance of public datasets. Here’s a sample that I found here. So, Here’s a demo:

An auto company is seeing sales trends of Hybrid cars and SUV’s from the sales data-sources. But what is the reason behind that? company data does not show that. Someone hypothesizes that it might be because of gas prices. So they test out the hypothesis by combining gas prices information available via public data. And turns out gas prices might be the driving force of sales trends! SEE:

if the gas prices increase, then the sale of SUV go down and the sale of Hybrids go up:

data analysis combine data with public datasets

You know that public data can be helpful! So how can you search for public data-sets? Well, You can manually search online, ask someone, browse through public data repositories like azure data market (and other data markets), there’s also a public data search engine! OR you can directly search for them via Data Explorer.

Here are the steps:

1) Excel 2010/2013 > Data Explorer Tab > Online Search > type “Tallest Buildings”

excel public data search data explorer2) I selected one of the data-sets that said “Tallest completed building…. ”

excel data from internet

3) Now let’s do some filtering and shaping. Here are the requirements:

– Hide columns: Image, notes & key

– clean columns that has heights data

– Show only city name in location

OK, let’s get to this one by one!

4) Hiding Columns:

Click on Filter & Shape button from the Query Settings:

excel data shaping cleaning

Select Image Column > Right Click > Hide:

excel hide remove columns

Repeat the steps for notes & key column.

Click on DONE

5) clean column that has heights data.

Click on Filter & Shape to open the query editor

A) let’s rename it. Select column > Right Click > Rename to Height > press ENTER

B) let’s remove the values in brackets. Select Column > right click > split column > By delimiter > At each occurrence of the delimiter > Custom and enter “(” > OK

excel split a columnThis should transform the data like this:

excel data explorer split a column

Hide height.2 and rename the height.1 to height

Click on DONE

6) Let’s just have city names in the location column

click on Filter & shape to load query editor:

A) select location > right click > split column > by delimiter > Custom – Enter: ° in the text box like this:

an excel split by delimiter dataclick on OK

Hide Location.2, Location.3, Location.4 & Location.5

Select Location.1 > Right Click > Split Column > by Number of characters > Number of characters: 2 > Once, as far right as possible > OK

cleaning data in excel shaping filtering

Hide Location.1.2 and rename Location.1.1 to Location

One last thing! making sure that the data type of height is numbers.

Select height > change type > number


Select floors > change type > number

click on DONE. Here’s our filtered and shaped data!

filter data excel shape clean


For the purpose of visualization I copied first 20 rows to a separate excel sheet and created a chart:

z excel data visualization

That’s about it for this post. Here are some related Posts on Data Explorer:
Unpivoting data using the data explorer preview for Excel 2010/2013
Merging/Joining datasets in Excel using Data Explorer add-in
Remove Duplicates in Excel Tables using Data Explorer Add-in
Web Scraping Tables using Excel add-in Data Explorer preview:

Your comments are very welcome!

Data Reporting ≠ Data Analysis


One of the key thing I’ve learned is importance of differentiating the concepts of “Data Reporting” and “Data Analysis”. So, let’s first see them visually:

data analysis and data reporting

Here’s the logic for putting Data Reporting INSIDE Data Analysis: if you need to do “analysis” then you need reports. But you do not have to necessarily do data analysis if you want to do data reporting.

From a process standpoint, Here’s how you can visualize Data Reporting and Data Analysis:

data analysis and data reporting process

Let’s thing about this for a moment: Why do we need “analysis”?

We need it because TOOLS are really great at generating data reports. But it requires a HUMAN BRAIN to translate those “data points/reports” into “business insights”. This process of seeing the data points and translating them into business insights is core of what is Data Analysis. Here’s how it looks visually:

Data analysis Data Reporting

Note after performing data analysis, we have information like Trends and Insights, Action items or Recommendations, Estimated impact on business that creates business value.


Data Reporting ≠ Data Analysis

Resource: 12 recorded sessions from the 24hop business analytics edition are online! #passbac #msbi


Recently, PASS hosted a 24hop business analytics event:

And now, the 12 one hour sessions ranging from data visualization, predictive analytics to Big Data are online for you to watch! They also serve as “Trailer” for what you can expect at the PASS Business Analytics conference!

Here’s the URL: http://passbaconference.com/Sessions/SneakPeeks.aspx

And I was following some of these sessions live on the event day – and I can tell you, these sessions are great resources!

Also, I participated in the twitter contest (by Microsoft BI) that was happening along w/ the event – and this is what I got for my win!

24 hop twitter contest prize

hoodie w/ embedded earphones!

That’s about it for this post. Enjoy the recordings!

Found something interesting by exploring a “List of companies by revenue” Data Set:


I like exploring data sets to find interesting patterns from them. To that end, I was exploring a data-set: List of companies by revenue and I added a column to calculate Revenue/Employee to explore the dataset:

And I found an outlier!

Here’s the outlier: Exor

Here’s what it’s interesting:

It’s revenue in 2012 is: 109.15 billion USD

And number of employees is just 40!

Just think of Revenue/Employee !

To put things in perspective, Lets Compare that with its neighbor in the data-set:

Rank | Company | Industry | Revenue in USD billion | Employees

48 Koch Industries Conglomerate 110.00 60000.00
49 EXOR Investment 109.15 40.00
50 Cardinal Health Pharmaceuticals 107.55 40000.00
51 CVS Caremark Retail 107.10 202000.00
52 IBM Computer services 106.92 433362.00

I got to know about this by quickly creating a data visualization to explore the data-set:

list of companies by revenue

And removing Trafigura, Vitol and Exor, this is what we have:

power view excel 2013 rank revenue employees

Observation: oil and gas industry have relatively higher revenue/employee ration.

That’s about it for this post. Thanks for reading about my data exploration!

Quick Note about SAS’s acroynm SEMMA:


SEMMA is an acronym introduced by SAS which stands for:

Sample, Explore, Modify, Model and Assess.

I had recently posted about the Data Mining & Knowledge Discovery Process which had following sequential steps:

Raw Data => cleaning => sampling => Modeling => Testing

SEMMA follows the similar sequential steps as we had seen in the data mining process. So while Data Mining process is applicable to any data mining tool out their, SEMMA helps when you use SAS enterprise miner. In fact, it has helped me quickly find the data mining functions available in SAS tool:

sas sample explore modify model assess

Back to basics: Data Mining and Knowledge Discovery Process


Once in a while I go back to basics to revisit some of the fundamental technology concepts that I’ve learned over past few years. Today, I want to revisit Data Mining and Knowledge Discovery Process:

Here are the steps:

1) Raw Data

2) Data Pre processing (cleaning, sampling, transformation, integration etc)

3) Modeling (Building a Data Mining Model)

4) Testing the Model a.k.a assessing the Model

5) Knowledge Discovery

Here is the visualization:

knowledge discovery process data miningAdditional Note:

In the world of Data Mining and Knowledge discovery, we’re looking for a specific type of intelligence from the data which is Patterns. This is important because patterns tend to repeat and so if we find patterns from our data, we can predict/forecast that such things can happen in future.


In this blog post, we saw the Knowledge Discovery and Data Mining process.

Three V’s of Big Data with Example:


In this blog-post, we would see the Three V’s of Big Data with Example:

1. Volume:

TB’s and PB’s and ZB’s of data that gets created:

From the webinar “How to Walk The Path from BI to Data Science: An interview with Michael Driscoll, data scientist and CEO of Metamarkets” – A global surge in Data

2. Velocity:

The speed at which information flows.

Example: 50 Million tweets per day!

twitter 50 million tweets per day

(This is back in Nov. of 2010 – the number must have increased!)

UPDATE 23 Nov 2012: on, wikipedia it says – 340 million tweets per day!

twitter 2012 340 million tweets per day

3. Variety:

All types of data is now being captured which may be in structured format or not.

Example: Text from PDF’s, Emails, Social network updates, voice calls, web traffic logs, sensor data, click streams, etc

data variety big data

Image courtesy

And this may be followed by other V’s like V for Value.


In this blog-post, we saw Three V’s of Big Data with Example.

Related Posts:

Who on earth is creating “Big data”?

Examples to help clarify what’s unstructured data and what’s structured?