Machine Learning Algorithm Cheat Sheet:


If you’re getting started with Data Science & Machine Learning then I think this would be a great resource for you. This “cheat sheet” helps you select the “algorithm” to test depending on the problem you are trying to solve and the data-set that you have.

Download link: (Courtesy: Azure Machine Learning)

Also, even though the cheat sheet was created to help you with “Azure Machine learning” product, it’s still valid if you use other machine learning tools.

Azure Machine Learning Algorithm Cheat Sheet


Data puking and how T-mobile alienated a potential customer:


I saw this ad on a highway earlier today and my reaction: why would I switch to a network that has just “96%” coverage.

T mobile ad — example of data puking

…instead of converting a potential buyer, this ad actually made me more nervous. You know why? Its a case of what I like to call “data puking” where you throw bunch of numbers/stats/data at someone hoping that they will take action based off of it. So what would have helped in this ad? It would have been great to see it compared against someone else. Something like: we have the largest coverage compared to xyz. My ATT connection is spotty in downtown areas so if it said something like we have 96% coverage compared to ATT’s 80% then I would have been much more likely to make the switch.

I wrote about this adding benchmark in your analysis here

Takeaway from this blog: don’t throw data points at your customers. Give them the context and guide them through the actions that you want them to take.

The role of Sentiment Analysis in Social Media Monitoring:


I’ve posted tutorial/resources about the Technical Side of Sentiment Analysis on this Blog. Here are the Links, if you need them:

LingPipe (Java Based) | Python | R language resource | Microsoft’s Tool “Social Analytics

Apart from this, I’ve used other Tools per project requirements and It’s been fun designing and developing projects on “Sentiment Analysis” primarily using Social Media Monitoring. Having worked with clients on projects that use “Sentiment Analysis” – I reflected about the role of Sentiment Analysis in Social Media Monitoring. And in this blog post, I am sharing these reflections:

What is Social Media Monitoring?

Social Media Monitoring is a process of “monitoring” conversations happening on social media channels about your brand/company.

Is it NEW? Not really. The idea of monitoring or gathering data about what is being talked about the brand/company is not new. Earlier, it was newspapers and magazine-articles and now, it’s the social media channels including online news, forums and blogs and thus the name given to this process is “Social Media Monitoring”

brand monitoring social media

What is Sentiment Analysis?

Analyzing data to categorize it under a “sentiment” (emotion).

Example. Is this review saying positive, negative or neutral thing about our product.

sentiment analysis positive negative neutral

side-note: Sentiment analysis is often categorized under “Big Data Analytics”.

What’s the Role of Sentiment Analysis in Social Media Monitoring?

We’ve seen that in social media monitoring, we gather all online conversations about a brand/product/company. Now wouldn’t it be great to take the data that we have and bucket it under “Positive”, “Negative” or “Neutral” categories for further analysis?

So few questions that can be answered after we have results from sentiment analysis:

1) Are people happy or sad about our product?

2) What do they like about our product?

3) What do they hate about our service?

4) Is there a trend or seasonality in sentiment data?

Among other business insights that may be not be easily answerable with just plain text data.

Thus sentiment analysis is one of the step in social media monitoring that assists in analyzing sentiment of all the conversations happening on the social web about a brand/product.

That’s about this for this post. Here’s a related post: Three Data Collection Tips for Social Media Analytics

your comments are very welcome!

Data Reporting ≠ Data Analysis


One of the key thing I’ve learned is importance of differentiating the concepts of “Data Reporting” and “Data Analysis”. So, let’s first see them visually:

data analysis and data reporting

Here’s the logic for putting Data Reporting INSIDE Data Analysis: if you need to do “analysis” then you need reports. But you do not have to necessarily do data analysis if you want to do data reporting.

From a process standpoint, Here’s how you can visualize Data Reporting and Data Analysis:

data analysis and data reporting process

Let’s thing about this for a moment: Why do we need “analysis”?

We need it because TOOLS are really great at generating data reports. But it requires a HUMAN BRAIN to translate those “data points/reports” into “business insights”. This process of seeing the data points and translating them into business insights is core of what is Data Analysis. Here’s how it looks visually:

Data analysis Data Reporting

Note after performing data analysis, we have information like Trends and Insights, Action items or Recommendations, Estimated impact on business that creates business value.


Data Reporting ≠ Data Analysis

Quick Note about SAS’s acroynm SEMMA:


SEMMA is an acronym introduced by SAS which stands for:

Sample, Explore, Modify, Model and Assess.

I had recently posted about the Data Mining & Knowledge Discovery Process which had following sequential steps:

Raw Data => cleaning => sampling => Modeling => Testing

SEMMA follows the similar sequential steps as we had seen in the data mining process. So while Data Mining process is applicable to any data mining tool out their, SEMMA helps when you use SAS enterprise miner. In fact, it has helped me quickly find the data mining functions available in SAS tool:

sas sample explore modify model assess

Examples of Machine Generated Data from “Big Data” perspective:


I just researched about Machine Generated Data from the context of “Big data”, Here’s the list I compiled:

– Data sent from Satellites

– Temperature sensing devices

– Flood Detection/Sensing devices

– web logs

– location data

– Data collected by Toll sensors (context: Road Toll)

– Phone call records

– Financial

And a Futuristic one:

Imagine sensors on human bodies that continuously “monitor” health. How about if we use them to detect diabetes/cancer/other-diseases in their early phases. Possible? May be!

Interesting Fact:

Machine can generate data “faster” than humans. This characteristics makes it interesting to think about to analyze machine generate data and in some cases, how to analyze them in real-time or near real-time

Ending Note:

Search for Machine Generated Data, you’ll be able to find much more, it’s worth reading about from the context of Big Data.


Data visualization: Cost of Hard Drive storage space


Here are the visualization:

1982 – 2009:

1982 2009 storage cost

2000 – 2008

2000 2008 storage cost

I grabbed data from: And – Thanks!


Storage cost has drastically decreased. Mathematically, Storage cost has decreased exponentially. No wonder we can store lot’s of data for few dollars and no wonder that the age of Big Data has already arrived!