Beginner’s Guide: Sentiment Analysis using Python on Windows

Standard

This is beginner’s guide to sentiment analysis using Python NLTK on windows. We’ll start w/ installing Python and NLTK and then see how to perform sentiment analysis.

Step 1: Install Python & NLTK

I followed the steps listed on http://nltk.org/install.html

1. Search for python 2.7.3 for windows and install it.

2. Search for Python setup Tools for Windows and install it.

3. Install PIP (for win 64 bit), NLTK and PyYAML.

4. Test installation: Start>All Programs>Python27>IDLE, then type import nltk

Now,

5. Also type:

>>> Import random

6. And also install movie_reviews corpus by typing:

>>>nltk.download()

in the new window that opens, install the movie_reviews corpus.

python nltk download data

Step 2: Sentiment Analysis

I followed the code explained in the NLTK book in the section “document classification” in ch 6 learning to classify text. Here is the section: http://nltk.org/book/ch06.html#document-classification

Using the code I was able to run the Naive Bayes Classifier to categorize text:

python sentiment analysis

Conclusion:

In this post, we learned how to perform sentiment analysis using Python on windwos platform. NLTK supports classifiers other than Naive Bayes, and also there are resources that will help  you increase the accuracy of the classifier. And I hope that this post acts as a starting guide for you!

Related articles

What’s “Naive” about Naive Bayes Machine Learning Algorithm?

Standard

In this post, I’ll post what why does the “Naive Bayes machine learning” algo have the word Naive in it?

So here is the short answer:

It “assumes” that the features are independent. (In other words: There’s no relation between the features that are used while building the model)

Let’s go a little deeper:

First up, few basic pointers.

> It’s a machine learning algorithm used for classification

> It’s based on Bayesian Statistics.

> you can read about it here: http://en.wikipedia.org/wiki/Naive_Bayes_classifier

Now, what do you mean when you mean that it is Naive because it assumes that features are independent?

Let’s take an example:

Suppose, you are building a “credit card approval” model based on Income and CreditScore

(SideNote: For those who do not know what is credit score, here you go: http://en.wikipedia.org/wiki/Credit_score_in_the_United_States)

And you have the following columns in the training data (Note: In machine learning, think of this columns as features)

Income CreditScore Approved
High High Yes
High Medium Yes
Low High Yes
Low Low NO

Here the features are Income & CreditScore and the target of the classification model is Approved.

In real world, there’s some relation between “income” and “creditscore”. Agree? Great! But Naive Bayes doesn’t think so. Let me reiterate the point of this blog post and see if it makes more sense now: it assumes that the features are “independent” and that’s why it is Naive!

I hope this helps. your comments are very welcome!