Excel data Mining in Action: Forecasting Twitter Followers for next week


OK, so you know I recently installed Data Mining Excel add-in: How to enable Data Mining in EXCEL powered by SQL Server Analysis Services? – and I couldn’t wait to go beyond the samples provided with the Excel add-in. So I decided to start with Forecasting. In this blog-post, I downloaded my Twitter stats into Excel. And of course, I had to clean and add computations which was equally exciting and I ended up with a data-set that had the follower count and also number of tweets I had.

The Date-Range in the Data-set is from 23 July. 2012 – 5 Sep. 2012. Of course, to get “better” forecast – you need to feed more historical data. In my case, the Twitter API didn’t allow me to pull ALL historical data at one go – let’s not get into details because that’s not the focus of the blog-post. But rule of thumb is that more historical data gives better forecast. And, Here are the steps I followed:

1. Loaded Data into Excel 2010. (I am using Twitter as an example here. Other real world scenario’s would be Sales Forecast). Note that I have kept it simple for the purpose of the demo.

2. Now, let’s create a forecast model.

Go to Data Mining Tab > Data Modeling > Forecast:

data mining excel forecast twitter followers

3) Forecast Wizard:

a. Getting Started with Forecast Wizard: NEXT

b. Select Source Data. Then Press NEXT

c. Select input columns. In this case, I selected Date as Time Stamp and Total Follower Count & Total Tweet Count as Input columns.

– Notice the Parameters Button? That is used to set the configuration of how the (Time Series) algorithm runs. For the purpose of this demo – I am going to explore that.

d. Finish.

4) It forecast-ed (Using the Time Series Data Mining Algorithm) the follower count for next week and if you can see – it says that on 12th Sep 2012, I would have 438 followers which is +3 when compared to today’s (5th Sep) follower count.

forecast twitter followers using excel data mining

5) Few Notes

a. I had selected Total Tweet count just to show that It can forecast more than one variable at same time. Here the model used the Date Column as the time-stamp while forecasting.

b. Of course, this may not happen for REAL because your follower count can go up or down based on

  • Tweet (Quality Tweets!) Frequency
  • Number-of-bots-that-decide-to-follow-you (kidding!)
  • Re-Tweeting interesting content and replying your followers. Basically being social!
  • If tweet gets picked by someone famous, your count increases
  • Other real life “surprises”..

Here’s the point though: This was just a Toy Example to show “forecasting” with Excel Data Mining – If I explore it further, I would document my experiences!

And oh, BTW here’s a nice video by @MarkTabNet and @SolidQ (SolidQ: I work at this amazing company!) on “Microsoft Data Mining Demo — Forecasting (SQL Server 2008 and Excel 2007”. And MarkTabNet is a great resource for Data Miners, Check it out!