1. Upload Twitter Text Data into Hadoop on Azure cluster
2. Create a Hive Table and load the data uploaded in step 1 to the Hive Table
3. Analyze data in Hive via Excel Add-in
Before we begin, I assume you have access to Hadoop on azure, Have your sample data (don’t have one? learn from a blog post), familiar with Hadoop ecosystem and know your way around the Hadoop on Azure Dashboard.
Now, Here are the steps involved:
STEP 1: Upload Twitter Text Data into Hadoop on Azure cluster
1. Have your data to be uploaded ready! I am just going to Copy Paste the File from my host machine to the RDP’ed machine. In this case, the machine that I am going is the Hadoop on Azure cluster.
For the purpose of this blog post, I have a text file having 1500 tweets:
2. Open web browser > Go to your cluster in Hadoop on Azure
3. RDP into your Hadoop on Azure cluster
4. Copy-Paste the File. It’s a small data file so this approach works for now.
Step 2: Create a Hive Table and load the data uploaded in step 1 to the Hive Table
1. Stay on the machine that you Remote Desktop (RDP’ed) into.
2. Open the Hadoop command line (you’ll see a icon on your Desktop)
3. switch to Hive:
4. Use the following Hive Commands:
DROP TABLE IF EXISTS TweetSampleTable;
CREATE TABLE TweetSampleTable ( id string, text string, favorited string, replyToSN string, created string, truncated string, replyToSID string, replyToUID string, statusSource string, screenName string );
LOAD DATA LOCAL INPATH ‘C:appsdistexamplesdatatweets.txt’ OVERWRITE INTO TABLE TweetSampleTable;
Note that for the purpose of this blog-post, I’ve chose string as data type for all fields. This is something that depends on the data that you have. If I were building a solution, I would spend some more time choosing the right data type.
Step 3. Analyze data in Hive via Excel Add-in
1. Switch to Hadoop on Azure Dashboard
2. Go to the Hive Console and run the show tables to verify that there is a tweetsampletable.
3. Now if you haven’t, Download and Install the Hive ODBC Driver from the Downloads section of your Hadoop on Azure Dashboard.