Doing Data Science at Twitter — A great read!

Doing Data science at Twitter

Doing Data science at Twitter

Why is “Doing Data Science at Twitter” a great read?

This is an insider’s perspective from someone who is working at a company that I classify as having the highest level of analytics maturity — In other words, Twitter is known to apply knowledge gained from data science into their products and business processes.

It’s also important to recognize that every company is different and the analytics/data-science tools/techniques/processes that would be implemented would also vary based on the analytics maturity — I love that this was one of the key insights shared in this article.

Also, the article talks about two types of data scientists…I thought it was great way to classify them because there’s a lot of confusion in the industry around what a Data scientist does. With that, Here’s the URL:

My two-year journey as a data scientist at twitter

Paras Doshi

PS: If you like articles like this, don’t forget to sign up for the newsletter!

New Digital Marketing Analytics Report shows social media is not the best source of acquiring customers:


It’s great to see Insights that data can uncover. I saw a nice insight in a report I read about Analyzing customer acquisition channels for e-commerce sites and in this blog post, I am sharing it with you. So what are the top customer acquisition channels for Commerce sites? The Top channels are Organic Search, Emails & Paid Search.Here’s the report: E-Commerce Customer Acquisition Snapshot

It was not surprising to me to see Organic Search and Emails being among the Top customer acquisition channels but what surprised me was  relatively poor performance of social media in acquiring customers. Here’s the chart showing performance of various online channels for acquiring customers:

ecommerce analytics percentage of customer acquired vs. channel

Data Source:

Note #1: The post is NOT about devaluing the benefits of social media and it comes to down to understanding the goals of having a social media presence in the first place. While computing the ROI of social media, there are other factors like increased brand awareness, customer loyalty to be considered. But I posted this data because it’s a great way to show how data can uncover insights and sometimes it may surprise you

Note #2: The percentage of customers acquired does not add up to 100% for a year because the data does not include things like direct traffic. The author of the report confirmed it over an email w/ me.

That’s about it for this post. Your comments are very welcome!

Three Data Collection Tips for Social Media Analytics


Data integrity is important especially if critical business decisions are based off on data. To that extent, in this post, I’ll write about five data collection tips to help you have accurate data for “social media analytics”. So here are the tips that are applicable to social media analytics irrespective of the tool you are using:

1. Social Media Platform


Select the right social media platform for capturing data. You do not want to select few such that you miss data.And you do want to select irrelevant social media platforms because if you do, then you’ll introduce noise in the data. Let me take an example. If your project needs to be based on USA only then you do not need to add “sina weibo” (Chinese social network) in your social media sources.

Now, Based on your business need for “social media analytics” campaign, you should test all possible social media platforms – you never know who might be talking about things that you are interested in. After you have selected the right social media platforms for your project, let’s go the next step:

2. “Search Keyword” Selection

Some of the social media platforms let’s you collect data via “search keywords”. Like twitter allows you to collect data via “hashtags” and/or keywords. So if you want to collect data about all social media posts having “american airlines” then you should not collect data using:


If you select the above rule, then it will introduce a LOT of noise because we’ll collect data people talking about just “American” PLUS data about people talking about just “airlines”. That’s bad!  What you want is rules like these:

1. American AND airlines

2. “American Airlines” (as a phrase)

american airlines social mediaNow, I can’t stress the importance of selecting the right search keywords enough. Choosing wrong keywords will add noise that would be bad for analytics. So choose keywords such that you are not adding noise as well as not missing on conversations. There’s no secret formula here, continuous improvement is the way to go!

3. Language & country Filtering


Social networks are GLOBAL in nature and so it’s important to filter (or include) based on the project that you’re working on. Not doing so would add noise in your data. And also remember to include country and language because you do not want to miss out on conversations either.


Three Data Collection Tips for Social media analytics that I shared in this post are:

1. Select Right Social Media Platform

2. Select Right search keywords

3. Select Right Country and Language.

Guest Blog: How to measure ROI of Social Media Marketing?



This is Guest Blog by Jugal Shah. Jugal is pursuing MBA w/ focus on Marketing from a premier university in India. He shares his views on marketing, sales and strategy via his Blog & Facebook.In this post, He briefly comments on “How to measure Social Media Marketing ROI”.

Jugal Shah’s Short post on Measuring Social Media Marketing ROI:

In social media marketing, ROI is not in just monitory terms. So, for social media ROI, my focus would be on
1) to how many people I have reached
2) How many people I have engaged through online activities
3) Becoming a conversation enabler and perception driver

Then focus on

1) how much increased revenue is due to social media reach (you can do this by tracking referred link)
2) How many leads you generated through social media
3) How social media efforts helped to resolve customer query/problems and led to more customer satisfaction (remember customer acquisition cost 10 times more than customer retention cost).

In a nutshell, It’s of utmost important to use Social Media as:

  • conversation enabler
  • perception driver
  • customer retention


Paras: Jugal, Thanks for this post. I am sure, this short post would be a great food for thought for readers who are interested in Digital Marketing Analytics or analytics in general. Readers, Feel free to reach out to him on his blog and/or Facebook page.

Three Data Visualizations I liked this week:


I have been working on creating Dashboards for one of my projects. As a part of the research, I looked at few Dashboards out their on the inter-webs. Here are three of them that I liked:

1. Social Media & Sentiment Analysis:

What I like about this Dashboard is the creative use of Data via Sentiment Analysis:

sentiment analysis social media dashboard

2. Microsoft Research’s Viral Search Project:

What a creative way to visualize viral content!

visualize viral social network data microsoft viral search

3. Social Media analytic’s Dashboard:

Nice one page social dashbaord!

social media analytics dashboard

Do you see the bottom right part of the report that shows you engagement levels by post type, if you want to compute it – here’s my blog post on that: Social Media Analytics. Facebook Page Smackdown: Status updates vs Images?


Sentiment Analysis using LingPipe on windows 7:


In this post, I’ll point you to the resource using which you can perform sentiment analysis using LingPipe on a windows OS. Along with that I’ll share couple of issues that I ran into when I was trying to run this demo on a Windows 7:

So first up, here’s the resource:

Now here are a couple of issues that I had:

1. Error: could not find or load the main class PolarityBasic

lingpipe could not find or load main class polaritybasic

To solve this error, you’ll need to build the files given under the C:lingpipe-4.1.0demostutorialsentiment – we use ANT for this. Let’s see how to do that:

2. Building sentiment.jar using ant jar

After successfully downloading ant on windows and setting the ANT_HOME variable to c:apache-ant-1.8.4 – I was still getting the error that ant is not a recognized command.

So I ran following commands:

C:>set ANT_HOME=C:apache-ant-1.8.1
C:>set JAVA_HOME=C:jdk1.6.0_24
C:>set PATH=%ANT_HOME%bin;%JAVA_HOME%bin
C:>ant -version
// it worked!


Now I ran the following command:

build sentiment.jar ant lingpipe

3. In the tutorial they used POLARITY_DIR – I didn’t use that, Instead I just inputted c:review_polarity because that’s where I unzipped the movie review dataset:

movie review sentiment analysis polarity

Here’s the screenshot about the command that does basic polarity analysis:

sentiment analysis lingpipe windows

And Thanks:

How to start Analyzing Twitter Data w/ R?


Over the past few weeks, I have posted notes about Analyzing Twitter Data w/ R, listing them here:

1. Install R & RStudio

2. R code to download twitter data

3. Perform Sentiment Analysis on Twitter Data (in R)