Tableau: Data Cleaning for Geographic Maps

Standard

Data cleaning is a major part of any analytic’s/data-visualization undertaking. If data cleaning is ignored then it leads to inaccurate data reporting & thus suboptimal business decisions.

To that end, if you create a Tableau’s Geographic map, please check the accuracy of your data by going to:

Menu Bar > Map > Edit Locations

Let me give you some examples:

Now, I have “states/province” as my geographic role for the variable and when I created a geographic map, I created a geographic map it didn’t show any state for New York State! See Before:

data cleaning geogrphic map before

So what did I do?

I navigated to Menu bar > Map > Edit locations:

data cleaning geogrphic map State

So I fixed it!

data cleaning geogrphic map Tableau

And After:

data cleaning geogrphic map after

Note that New York State is lighted up!

In the past, I’ve also have entered Latitude & Longitude if need be.  This is when it was not able to recognize few US cities and it was saying “ambiguous” – I inputted Latitude & Longitude to clean the data:

data cleaning geogrphic map city

Conclusion:

In this post, I described how you should check the data accuracy of a Tableau Geographic Map.

Received President’s Volunteer Service Award!

Standard

Received President’s volunteer service award for year 2012!

Letter from US president Barack Obama:

letter from Barack Obama presidents volunteer service award

Label Pin:

label pin president's volunteer service award copy

Certificate:

certificate presidents volunteer service award copy

Related Notes:

– Check out Give Camp – It’s a great way for IT professionals to give back to society. If you do not a give camp in your country, you can always start one 🙂

Met revered APJ Abdul Kalam

My First Five Toastmaster International club speeches:

Standard

In this post, I want to share the title of the first five Toastmaster speeches that I delivered over the past few months:

1. Ice Breaker

2. Glossophobia (Fear of Public Speaking)

3. Get ready to outsource your computing

4. What’s it like for folks who live below poverty line?

5. what’s it like to start a startup?

Related Post:
Half way through Toastmaster’s competent communicator manual
Want to practice public speaking? Join Toastmaster’s!

How many websites in USA exceed the data collection limitations of Google Analytics?

Standard

Little bit of background:

– I was researching on the limitations of Google Analytics

– After reading the Limitations, I wanted to know – How many websites in USA exceed the limitations of Google Analytics?

So Here’s the Short Answer:

Only 108 sites exceed this limitation

(as of today)

And Here’s the long answer:

Limitations of Google Analytics. Here’s the URL: http://support.google.com/analytics/bin/answer.py?hl=en&answer=1070983

And I am quoting from the above URL:

Data Collection limit: You should not send more than 10 million hits per month. If you exceed this limit, there is no assurance that the excess hits will be processed.
Data Freshness limit: Sending more than 200,000 visits per day to Google Analytics will result in your reports being refreshed only once per day

And to take it further, I wanted to know how many website in USA get greater than 10 million hits per month, turns out only 108 websites in US get that much traffic.
Source: http://www.quantcast.com/top-sites/US?jump-to=108

so from data collection limit standpoint, only these 100 odd sites would exceed the limitations of Google Analytics.

To put things in Perspective: MySpace.com does not exceed Data Collection Google Analytics Limit:

my space can use google analytics

Conclusion

Just knowing about the Data Collection Limit was not interesting but I combined data from other data sources – it seemed very interesting to me! Anyhoo – In this post, I shared:

> Limitations of Google Analytics

> Answered How many websites in USA exceed the limitations of Google Analytics?

[UPDATE Feb 10th 2013] I made a mistake in correlating data from Quantcast and Google Analytics. Lesson learned: double-check for units when comparing data from two different sources

Florin Dumitrescu pointed out that while Quantcast uses People/Month and Google uses hits/month. They may NOT be always the same. Sorry about this.

Neologism is the new challenge for IT professionals, Here’s why:

Standard

What is Neologism?

Neologism means The coining or use of new words – And I believe it’s one of the challenge faced by IT professionals. Nowadays, we put our time & energy trying to get head around “new terms/words/trends”.

Let’s take couple of example(s):

Sometime back, we had cloud computing. Nowadays, its Big Data; In my mind – Big Data has been coined to mean following technologies/techniques under different contexts:

Big Data Unstrucutred External Text Public Data

Note: The above image is just for illustration purpose. It does not comprehensively cover every technology that is now called “Big Data”. Feel free to point it out if you think I missed something important.

And Neologism is challenge because:

1) Generally, it’s a new trend and there is little to no consensus on what does it “Exactly” mean

2) It means different things in different context

3) Every person can have their own “interpretation” and no one is wrong.

4) It’s a moving ball. The definition used today will change in future. So we always need a “working” definition for these terms.

Now, Don’t get me wrong, It’s fun trying to figure out what does it all mean and trying to gauge whether it matters to me and my organization or not! What do you think – as a Person in Information Technology, do you think that Neologism is one of the challenges faced by us? consider leaving a reply in the comment section!

Related Articles:

Want to learn about BigData? read Oreilly’s Book “Planning for BigData”

Quote for Big-Data / Data-Science/ Data-Analysis enthusiasts:

Who on earth is creating “Big data”?

Examples to help clarify what’s unstructured data and what’s structured?

Playing w/ the Occupational Employement Statistics Data-Set:

Standard

I found some data-sets on Occupational Employment Statistics on Bureau of Labor Statistics site and I played with it to see if I can find something interesting:

Few things about the data & visualization that I am going to share

  • US only
  • I downloaded the national level data But there’s also state level data available if you’re interested to drill down.
  • The reports that you see where created after I got a chance to “clean” the data-set a bit and created a data model that suited basic reporting on top of it.
  • For this blog post, I am going to play w/ May 2010 & 2011 data
  • With the help of original data-set, you can drill down to get statistics about a particular Job Category if you want. For this blog-post, I am going to share visualizations that correspond to Job categories.
  • click on images to see the higher resolution image.

With that, Here are some visualizations:

1) Job Category VS mean hourly salary:

1 Job category vs hourly salary mean bureau of labour statistics

2) Job Category VS number of employees:

2 Job category vs number of employees bureau of labour statistics

3) Scatter Plot:

X Axis: Number of employees

Y – Axis: Wage (Mean Hourly Salary May 2011)

Size of Bubble: Wage (Mean Hourly Salary May 2011)

*Note: This may not be the best approach to create the Scatter Plot as I have used the same value (Mean Hourly Salary May 2011) twice – But since I was just playing w/ it, I went with what I had in the model.

Here’s the visualization:

3 scatter plot number of employees vs mean hourly wage may 2011 employment statistics

Some of the things I observed:

1) I belong to an Industry (Computer and Mathematical occupations) which has relatively higher mean hourly wage.

2) There are few people working in “farming, fishing & forestry occupations” that do not get paid much.

3) There are lots of people working in “office administrative support occupations” that do not get paid much.

4) Management Occupations, Legal Occupations and computer & mathematical occupations have relatively higher mean hourly wages.

Conclusion:

In this post, I played w/ Occupational Employment statistics data-sets and shared some visualizations.

Prepare yourselves for ‘Capped Data Plans’ VS ‘growing cloud computing adoption’ battle.

Standard

I love cloud! And No, I am not a marketer – I am a technologist and I love cloud after minus-ing the marketing mumbo-jumbo/hype. And i would like cloud to emerge victorious. But i see a road block and that’s a problem. And…..I like pointing at problems (Oops!) I like coming up with creative solutions to problems.

So, in this post, I am going to talk about a problem Errrr, probable solution(s) to a potential problem that can adversely affect our lifestyle. To be specific, it can drastically increase the money we pay for Internet in future. And with growing cloud computing adoption, we will consume more bandwidth in future since we will have LOT of data on cloud. The more the data transfer between cloud and our device – the more will be the bandwidth usage.

Now, Today, it’s not a problem. At least not in USA – as we can live with “LIMITED” mobile (/Wireless cellular things like 4G, 3G, etc) data plans  combined with “UNLIMITED” Home data plans (Wired one’s – yes, the Internet made up of the fiber cables). So in future, “IF” we still have UNLIMITED home data plans around – this battle may fortunately never happen.

Wait…But think what if there will be no UNLIMITED plan?

I know it’s scary. So year by year, The internet usage would start doubling and Limit (cap) on data usage would be halved.

Wait..That’s sounds like an Inverse Moore’s law!

And what if the Home data plans would look like: $2 for every GB. At my current usage  – I would be paying little less than 80 dollars. And I can’t imagine my data usage if all my data is on cloud.  What if I do not have a local storage and instead, I’ll have all my data on some “cloud storage”. I can imagine a bill of $200 per month. And that’s a problem? You think so too?

No!? Unrealistic?! Home data plans cannot be capped?! Here’s the article: http://money.cnn.com/2011/05/17/technology/netflix_canada/index.htm that talks about the challenge that Netflix faced in Canada because of “Capped Home data plans”. In fact, Netflix had to offer customization option in video quality that could help Canadian customers “save” on data usage. Any-who- if Home data plans were capped in Canada – I can say, this may happen in USA and other countries too.

And still if you think that capped home data plans are not a possibility. I would like to point out that, not long ago, we had UNLIMITED data plans through wireless cellular services and Now, we do not have it. (And I know – Sprint offers unlimited data plans. But could that not change too?)

So what can we do about it?

– Spread the word. Prepare yourselves for the battle! Contact Imp. People. Contact Gov. etc, etc, etc. ( Ok, I tweeted, Now what?! Any probable solution? )

Ideally, the data should not be capped. (And if it does happen – We can run an occupy Comcast campaign! Ok – sorry. could not resist it!)

realistically, How about a reasonable capped home data plan? What’s reasonable – Well, I mean I do not like 2 GB and 5 GB limits. That’s way too small. On the other hand a capped data plan of 1 TB is too high. It’s virtually unlimited. How about something in between which is reasonable for both – we THE CLOUD USERS and the Internet providers (ISP’s)? That is what I think – you may have different perspective on it – if you do, go ahead and post your views as comments.