Doing Data Science at Twitter — A great read!

Link
Doing Data science at Twitter

Doing Data science at Twitter

Why is “Doing Data Science at Twitter” a great read?

This is an insider’s perspective from someone who is working at a company that I classify as having the highest level of analytics maturity — In other words, Twitter is known to apply knowledge gained from data science into their products and business processes.

It’s also important to recognize that every company is different and the analytics/data-science tools/techniques/processes that would be implemented would also vary based on the analytics maturity — I love that this was one of the key insights shared in this article.

Also, the article talks about two types of data scientists…I thought it was great way to classify them because there’s a lot of confusion in the industry around what a Data scientist does. With that, Here’s the URL:

My two-year journey as a data scientist at twitter

Best,
Paras Doshi

PS: If you like articles like this, don’t forget to sign up for the newsletter!

New Digital Marketing Analytics Report shows social media is not the best source of acquiring customers:

Standard

It’s great to see Insights that data can uncover. I saw a nice insight in a report I read about Analyzing customer acquisition channels for e-commerce sites and in this blog post, I am sharing it with you. So what are the top customer acquisition channels for Commerce sites? The Top channels are Organic Search, Emails & Paid Search.Here’s the report: E-Commerce Customer Acquisition Snapshot

It was not surprising to me to see Organic Search and Emails being among the Top customer acquisition channels but what surprised me was  relatively poor performance of social media in acquiring customers. Here’s the chart showing performance of various online channels for acquiring customers:

ecommerce analytics percentage of customer acquired vs. channel

Data Source: http://blog.custora.com/2013/06/e-commerce-customer-acquisition-snapshot/

Note #1: The post is NOT about devaluing the benefits of social media and it comes to down to understanding the goals of having a social media presence in the first place. While computing the ROI of social media, there are other factors like increased brand awareness, customer loyalty to be considered. But I posted this data because it’s a great way to show how data can uncover insights and sometimes it may surprise you

Note #2: The percentage of customers acquired does not add up to 100% for a year because the data does not include things like direct traffic. The author of the report confirmed it over an email w/ me.

That’s about it for this post. Your comments are very welcome!

Three Data Collection Tips for Social Media Analytics

Standard

Data integrity is important especially if critical business decisions are based off on data. To that extent, in this post, I’ll write about five data collection tips to help you have accurate data for “social media analytics”. So here are the tips that are applicable to social media analytics irrespective of the tool you are using:

1. Social Media Platform

social_media

Select the right social media platform for capturing data. You do not want to select few such that you miss data.And you do want to select irrelevant social media platforms because if you do, then you’ll introduce noise in the data. Let me take an example. If your project needs to be based on USA only then you do not need to add “sina weibo” (Chinese social network) in your social media sources.

Now, Based on your business need for “social media analytics” campaign, you should test all possible social media platforms – you never know who might be talking about things that you are interested in. After you have selected the right social media platforms for your project, let’s go the next step:

2. “Search Keyword” Selection

Some of the social media platforms let’s you collect data via “search keywords”. Like twitter allows you to collect data via “hashtags” and/or keywords. So if you want to collect data about all social media posts having “american airlines” then you should not collect data using:

AMERICAN OR Airlines:

If you select the above rule, then it will introduce a LOT of noise because we’ll collect data people talking about just “American” PLUS data about people talking about just “airlines”. That’s bad!  What you want is rules like these:

1. American AND airlines

2. “American Airlines” (as a phrase)

american airlines social mediaNow, I can’t stress the importance of selecting the right search keywords enough. Choosing wrong keywords will add noise that would be bad for analytics. So choose keywords such that you are not adding noise as well as not missing on conversations. There’s no secret formula here, continuous improvement is the way to go!

3. Language & country Filtering

global-social-network

Social networks are GLOBAL in nature and so it’s important to filter (or include) based on the project that you’re working on. Not doing so would add noise in your data. And also remember to include country and language because you do not want to miss out on conversations either.

Conclusion:

Three Data Collection Tips for Social media analytics that I shared in this post are:

1. Select Right Social Media Platform

2. Select Right search keywords

3. Select Right Country and Language.

Guest Blog: How to measure ROI of Social Media Marketing?

Standard

Introduction:

This is Guest Blog by Jugal Shah. Jugal is pursuing MBA w/ focus on Marketing from a premier university in India. He shares his views on marketing, sales and strategy via his Blog & Facebook.In this post, He briefly comments on “How to measure Social Media Marketing ROI”.

Jugal Shah’s Short post on Measuring Social Media Marketing ROI:

In social media marketing, ROI is not in just monitory terms. So, for social media ROI, my focus would be on
1) to how many people I have reached
2) How many people I have engaged through online activities
3) Becoming a conversation enabler and perception driver

Then focus on

1) how much increased revenue is due to social media reach (you can do this by tracking referred link)
2) How many leads you generated through social media
3) How social media efforts helped to resolve customer query/problems and led to more customer satisfaction (remember customer acquisition cost 10 times more than customer retention cost).

In a nutshell, It’s of utmost important to use Social Media as:

  • conversation enabler
  • perception driver
  • customer retention

Conclusion:

Paras: Jugal, Thanks for this post. I am sure, this short post would be a great food for thought for readers who are interested in Digital Marketing Analytics or analytics in general. Readers, Feel free to reach out to him on his blog and/or Facebook page.

Three Data Visualizations I liked this week:

Standard

I have been working on creating Dashboards for one of my projects. As a part of the research, I looked at few Dashboards out their on the inter-webs. Here are three of them that I liked:

1. Social Media & Sentiment Analysis:

What I like about this Dashboard is the creative use of Data via Sentiment Analysis:

sentiment analysis social media dashboard

2. Microsoft Research’s Viral Search Project:

What a creative way to visualize viral content!

visualize viral social network data microsoft viral search

3. Social Media analytic’s Dashboard:

Nice one page social dashbaord!

social media analytics dashboard

Do you see the bottom right part of the report that shows you engagement levels by post type, if you want to compute it – here’s my blog post on that: Social Media Analytics. Facebook Page Smackdown: Status updates vs Images?

 

Sentiment Analysis using LingPipe on windows 7:

Standard

In this post, I’ll point you to the resource using which you can perform sentiment analysis using LingPipe on a windows OS. Along with that I’ll share couple of issues that I ran into when I was trying to run this demo on a Windows 7:

So first up, here’s the resource:

http://alias-i.com/lingpipe/demos/tutorial/sentiment/read-me.html

Now here are a couple of issues that I had:

1. Error: could not find or load the main class PolarityBasic

lingpipe could not find or load main class polaritybasic

To solve this error, you’ll need to build the files given under the C:lingpipe-4.1.0demostutorialsentiment – we use ANT for this. Let’s see how to do that:

2. Building sentiment.jar using ant jar

After successfully downloading ant on windows and setting the ANT_HOME variable to c:apache-ant-1.8.4 – I was still getting the error that ant is not a recognized command.

So I ran following commands:

C:>set ANT_HOME=C:apache-ant-1.8.1
C:>set JAVA_HOME=C:jdk1.6.0_24
C:>set PATH=%ANT_HOME%bin;%JAVA_HOME%bin
C:>ant -version
// it worked!

Thanks: http://stackoverflow.com/questions/5607664/installing-ant-ant-home-is-set-incorrectly-on-windows-7

Now I ran the following command:

build sentiment.jar ant lingpipe

3. In the tutorial they used POLARITY_DIR – I didn’t use that, Instead I just inputted c:review_polarity because that’s where I unzipped the movie review dataset:

movie review sentiment analysis polarity

Here’s the screenshot about the command that does basic polarity analysis:

sentiment analysis lingpipe windows

And Thanks: http://stackoverflow.com/questions/15010184/lingpipe-and-sentiment-analysis/15011482

How to start Analyzing Twitter Data w/ R?

Standard

Over the past few weeks, I have posted notes about Analyzing Twitter Data w/ R, listing them here:

1. Install R & RStudio

2. R code to download twitter data

3. Perform Sentiment Analysis on Twitter Data (in R)

Hadoop on Windows: How to Browse the Hadoop Filesystem?

Standard

This Blog post applies to Microsoft® HDInsight Preview for a windows machine. In this Blog Post, we’ll see how you can browse the HDFS (Hadoop Filesystem)?

1. I am assuming Hadoop Services are working without issues on your machine.

2. Now, Can you see the Hadoop Name Node Status Icon on your desktop? Yes? Great! Open it (via Browser)

3. Here’s what you’ll see:

Hadoop File System Browse

4. Can you see the “Browse the filesystem” link? click on it. You’ll see:

hadoop file system name node status windows

5. I’ve used the /user/data lately, so Let me browse to see what’s inside this directory:

user data hadoop sqoop hive mapreduce

6. You can also type in the location in the check box that says Goto

7. If you’re on command line, you can do so via the command:

hadoop fs -ls /

hadoop command line list all files system

And if you want to browse files inside a particular directory:

hadoop command line sqoop mapreduce hdfs file system

Official Resource:

HDFS File System Shell Guide

Conclusion

In this post, we saw how to browse Hadoop File system via Hadoop Command Line & Hadoop Name Node Status

Related Articles:

Microsoft® HDInsight Preview for Windows: How to create a directory in Hadoop File System?

Standard

In this post, we’ll see how to create a directory in the Hadoop File System for HDInsight’s windows version.

Here are the steps:

1. You have the Microsoft® HDInsight Preview for Windows Installed on your machine. Here’s a tutorial: Installing HDInsight (Microsoft’s Hadoop) on windows 7

2. Make sure that the Cluster is up & running! To check this, I click on the “Microsoft HDInsight Dashboard” or open http://localhost:8085/ on my machine

Did you get any “wait for cluster to start..” message? No? Great! Hopefully, all your services are working perfectly and you are good to go now!

3. Let’s start the Hadoop Command Line (can you see the Icon on the Desktop? Yes? Great! Open that!)

4. Here the command to create a directory looks like:

hadoop fs -mkdir /user/data/input

The above command creates /user/data/input

5. Let’s verify that the input directory was created under /user/data

hadoop fs -ls /user/data

hadoop file system list files in a directory create directory

Conclusion:
In this post, we saw how to create a directory in Hadoop (on windows) file system and also we saw how to list files/directory using the -ls command.

Related Articles:

 

Neologism is the new challenge for IT professionals, Here’s why:

Standard

What is Neologism?

Neologism means The coining or use of new words – And I believe it’s one of the challenge faced by IT professionals. Nowadays, we put our time & energy trying to get head around “new terms/words/trends”.

Let’s take couple of example(s):

Sometime back, we had cloud computing. Nowadays, its Big Data; In my mind – Big Data has been coined to mean following technologies/techniques under different contexts:

Big Data Unstrucutred External Text Public Data

Note: The above image is just for illustration purpose. It does not comprehensively cover every technology that is now called “Big Data”. Feel free to point it out if you think I missed something important.

And Neologism is challenge because:

1) Generally, it’s a new trend and there is little to no consensus on what does it “Exactly” mean

2) It means different things in different context

3) Every person can have their own “interpretation” and no one is wrong.

4) It’s a moving ball. The definition used today will change in future. So we always need a “working” definition for these terms.

Now, Don’t get me wrong, It’s fun trying to figure out what does it all mean and trying to gauge whether it matters to me and my organization or not! What do you think – as a Person in Information Technology, do you think that Neologism is one of the challenges faced by us? consider leaving a reply in the comment section!

Related Articles:

Want to learn about BigData? read Oreilly’s Book “Planning for BigData”

Quote for Big-Data / Data-Science/ Data-Analysis enthusiasts:

Who on earth is creating “Big data”?

Examples to help clarify what’s unstructured data and what’s structured?