Hadoop on Windows: How to Browse the Hadoop Filesystem?

Standard

This Blog post applies to Microsoft® HDInsight Preview for a windows machine. In this Blog Post, we’ll see how you can browse the HDFS (Hadoop Filesystem)?

1. I am assuming Hadoop Services are working without issues on your machine.

2. Now, Can you see the Hadoop Name Node Status Icon on your desktop? Yes? Great! Open it (via Browser)

3. Here’s what you’ll see:

Hadoop File System Browse

4. Can you see the “Browse the filesystem” link? click on it. You’ll see:

hadoop file system name node status windows

5. I’ve used the /user/data lately, so Let me browse to see what’s inside this directory:

user data hadoop sqoop hive mapreduce

6. You can also type in the location in the check box that says Goto

7. If you’re on command line, you can do so via the command:

hadoop fs -ls /

hadoop command line list all files system

And if you want to browse files inside a particular directory:

hadoop command line sqoop mapreduce hdfs file system

Official Resource:

HDFS File System Shell Guide

Conclusion

In this post, we saw how to browse Hadoop File system via Hadoop Command Line & Hadoop Name Node Status

Related Articles:

Microsoft® HDInsight Preview for Windows: How to create a directory in Hadoop File System?

Standard

In this post, we’ll see how to create a directory in the Hadoop File System for HDInsight’s windows version.

Here are the steps:

1. You have the Microsoft® HDInsight Preview for Windows Installed on your machine. Here’s a tutorial: Installing HDInsight (Microsoft’s Hadoop) on windows 7

2. Make sure that the Cluster is up & running! To check this, I click on the “Microsoft HDInsight Dashboard” or open http://localhost:8085/ on my machine

Did you get any “wait for cluster to start..” message? No? Great! Hopefully, all your services are working perfectly and you are good to go now!

3. Let’s start the Hadoop Command Line (can you see the Icon on the Desktop? Yes? Great! Open that!)

4. Here the command to create a directory looks like:

hadoop fs -mkdir /user/data/input

The above command creates /user/data/input

5. Let’s verify that the input directory was created under /user/data

hadoop fs -ls /user/data

hadoop file system list files in a directory create directory

Conclusion:
In this post, we saw how to create a directory in Hadoop (on windows) file system and also we saw how to list files/directory using the -ls command.

Related Articles:

 

Neologism is the new challenge for IT professionals, Here’s why:

Standard

What is Neologism?

Neologism means The coining or use of new words – And I believe it’s one of the challenge faced by IT professionals. Nowadays, we put our time & energy trying to get head around “new terms/words/trends”.

Let’s take couple of example(s):

Sometime back, we had cloud computing. Nowadays, its Big Data; In my mind – Big Data has been coined to mean following technologies/techniques under different contexts:

Big Data Unstrucutred External Text Public Data

Note: The above image is just for illustration purpose. It does not comprehensively cover every technology that is now called “Big Data”. Feel free to point it out if you think I missed something important.

And Neologism is challenge because:

1) Generally, it’s a new trend and there is little to no consensus on what does it “Exactly” mean

2) It means different things in different context

3) Every person can have their own “interpretation” and no one is wrong.

4) It’s a moving ball. The definition used today will change in future. So we always need a “working” definition for these terms.

Now, Don’t get me wrong, It’s fun trying to figure out what does it all mean and trying to gauge whether it matters to me and my organization or not! What do you think – as a Person in Information Technology, do you think that Neologism is one of the challenges faced by us? consider leaving a reply in the comment section!

Related Articles:

Want to learn about BigData? read Oreilly’s Book “Planning for BigData”

Quote for Big-Data / Data-Science/ Data-Analysis enthusiasts:

Who on earth is creating “Big data”?

Examples to help clarify what’s unstructured data and what’s structured?

Sentiment Analysis in R w/ Twitter data feeds

Standard

I followed instructions on this site to perform sentiment analysis about Starbucks from Twitter data feeds.

Here are data visualizations:

1. Sentiment Analysis: Starbucks on Twitter

sentiment analysis starbucks on twitter

2. Comparison cloud:

comparison cloud data visualization

That’s about it for this post, Here are some related tutorials:

If you want to Install R on windows machine, here’s a Tutorial: http://parasdoshi.com/2012/11/13/lets-install-r-rstudio-on-windows-machine/

If you want to try out out Hadoop on windows, Hive and Hive excel add-in w/ Twitter Data, Here’s a Tutorial: http://parasdoshi.com/2012/11/16/how-to-load-twitter-data-into-hadoop-on-azure-cluster-and-then-analyze-it-via-hive-add-in-for-excel/

If you want to Grab Twitter search data using R and export to a tab delimited file. Here’s a tutorial: http://parasdoshi.com/2012/11/24/grab-twitter-search-data-using-r-and-export-to-a-tab-delimited-file/

Two ideas to make your social network activities “Searchable”:

Standard

Some time back, I wanted to search one of my own social network post. It was a resource I had shared and somehow I was not able to “google” it (again). I eventually found it – but it took me 15 odd minutes to scroll down to my twitter feed. It was NOT fun! And I thought to myself – there’s got to be a better way! And I thought – It’ll be great if I solve it for not just Twitter but all my social network activities that includes LinkedIn, Facebook Pages, Google+. So here’s couple of things thats working for me, I hope it helps someone out there too:

Now, before we begin when I say “Searchable” – I mean searchable by YOU (or a human being) and not necessarily search engines. But it turns out, both my ideas increase your chances of getting your social media activities Indexed! With that, Here are the ideas:

1) Syndicate your Social Network Activities (Posts/Images/Updates) to Tumblr/Blogger

I use IFTTT to syndicate my Twitter, Facebook and LinkedIn activities to Blogger

2) Create a post about your social network activities on your blog:

Here’s an Example: Things I shared on Social Media Networks during Oct 19 – Nov 11

Though Idea #2’s main goal is to keep my blog readers updated about my social network activities – But it also acts as a good way to make my social media posts “searchable”.

And remember I said earlier that the chances of your social network posts getting indexed by search engines increases? That’s because WordPress, Tumblr & Blogger’s posts are accessible by Google (unless you choose to block it). So that’s about it for this post. If you like the idea(s), please let me know! And if you have other ideas – also let me know, I am always looking for ways to make my social media activities easily searchable to me as well as for anyone else.

Let’s connect and converse on any of these people networks!

paras doshi blog on facebookparas doshi twitter paras doshi google plus paras doshi linkedin

I played with Twitter Firehose for couple of hours and how you can do so too:

Standard

First up: what’s a Twitter Fire-hose?

It’s a real-time stream of tweets! I had pointed out in an earlier post that Twitter gets 340 million tweets per day!

twiiter fire hose 340 millions tweets per dayImage courtesy

Why did I want access to Fire-hose?

Curiosity.

I had heard – It’s expensive, Is it?

For an Individual: Absolutely! For companies: Not if they know how to create business value out of it.

Note the words “couple of hours” in the title. I’ll Explain that part later.

How did you get access?

via DataSift. They had a free trial w/ 10$ credit and I tried that. Check them out if want to play with Twitter Firehose. It’s fun!

What did I do with it?

I collected 15,000 tweets over a period of 2 hours containing words “Google” OR “Microsoft“.

Total cost for me: 3-4$

Note: I added the cost just so that you get a general Idea. Look at the pricing page of DataSift for more details.

Are their other Twitter Data  Resellers?

Yes. As of now, it’s DataSift, GNIP and Topsy. search for “Twitter Certified Data Reseller Products” to find the list. I was able to find a Free Trial by DataSift and that’s why I tried DataSift.

If I just want to play with Twitter Data, what are the alternatives?

you can work with their streaming API which gives 1% of tweets. you can find an example here: Grab Twitter search data using R and export to a tab delimited file

Conclusion:

In this post, I discussed about how you can try Twitter Firehose. Also pointed you to an alternative of using streaming API which gives 1% of tweets. I hope that helps.

Grab Twitter search data using R and export to a tab delimited file

Standard

In this blog-post, we would see how you can grab Twitter search data using R and then export it to tab delimited file. Here are the steps:

1) First up, if we do not have R – you can install it by following the tutorial: Let’s install R Studio and R on windows machine

2) Instal Package: TwitteR if you haven’t

3) Look at the following code, modify the path in line #4 for write.table:

> require(twitteR)

> tweets <- searchTwitter(“#excel”,n=1500)

> tweetdataframe <- do.call(“rbind”,lapply(tweets,as.data.frame))

> write.table(tweetdataframe,”c:/users/paras/desktop/tweetsaboutexcel.txt”,sep=”t”)

4) so now you have tab delimited file having about 1500 tweets!

1500 tweets R excel tab delimited RStudio code

You can also export the tweets to Excel spreadsheet, SPSS and SAS. Check this out: quick R Exporting Data

Conclusion:

In this blog-post, we saw how you can grab 1500 tweets using R and then export it to a tab delimited file.