Quick Post: Uploading Local Data to Hadoop file system using Hadoop Command Line


This is a Quick Post, Just want to share a command to upload local data to HDFS using Hadoop Command Line.

The command looks like:

> hadoop fs -copyFromLocal input.txt input/SqrtJob/input.txt


inner workings of HDFS and MapReduce in a nutshell:


HDFS and MapReduce inner workings in a nutshell.

HDFS MapReduce inner workings

Click on the image to view larger sized image


How to load some data to Hadoop on Windows to get started?


In this post, I want to point out that HDInsight (Hadoop on Windows) comes with a sample datasets (log files) that you can load using the command:

1. Hadoop command Line > Navigate to c:HadoopGettingStarted

2. Execute the following command:

powershell -ExecutionPolicy unrestricted –F importdata.ps1 w3c

import data to hadoop on windows file system

After you have successfully executed the command, you can sample files in /w3c/input folder:

w3c log files iis hadoop on windows

Conclusion: In this post, we saw how to load some data to Hadoop on Windows file system to get started. Your comments are very welcome.

Official Resource: http://gettingstarted.hadooponazure.com/loadingData.html

Hadoop on Windows: How to Browse the Hadoop Filesystem?


This Blog post applies to Microsoft® HDInsight Preview for a windows machine. In this Blog Post, we’ll see how you can browse the HDFS (Hadoop Filesystem)?

1. I am assuming Hadoop Services are working without issues on your machine.

2. Now, Can you see the Hadoop Name Node Status Icon on your desktop? Yes? Great! Open it (via Browser)

3. Here’s what you’ll see:

Hadoop File System Browse

4. Can you see the “Browse the filesystem” link? click on it. You’ll see:

hadoop file system name node status windows

5. I’ve used the /user/data lately, so Let me browse to see what’s inside this directory:

user data hadoop sqoop hive mapreduce

6. You can also type in the location in the check box that says Goto

7. If you’re on command line, you can do so via the command:

hadoop fs -ls /

hadoop command line list all files system

And if you want to browse files inside a particular directory:

hadoop command line sqoop mapreduce hdfs file system

Official Resource:

HDFS File System Shell Guide


In this post, we saw how to browse Hadoop File system via Hadoop Command Line & Hadoop Name Node Status

Related Articles:

Microsoft® HDInsight Preview for Windows: How to use Sqoop to load data into HDFS from SQL Server?


In this post, we’ll see how to use Sqoop to load data into HDFS from SQL Server?

With that, here are the steps:

1. You have the Microsoft® HDInsight Preview for Windows Installed on your machine. Here’s a tutorial: Installing HDInsight (Microsoft’s Hadoop) on windows 7

2. Make sure that the Cluster is up & running! To check this, I click on the “Microsoft HDInsight Dashboard” or open http://localhost:8085/ on my machine

Did you get any “wait for cluster to start..” message? No? Great! Hopefully, all your services are working perfectly and you are good to go now!

3. Before we begin, decide on three things:

3a: Username and Password that Sqoop would use to login to the SQL Server database. If you create a new username and pasword, test it via SSMS before you proceed.

3b. select the table that you want to load into HDFS

In my case, it’s this table:

sql table to be loaded into hadoop hdfs from sql server3c: The target directory in HDFS. in my case I want it to be /user/data/sqoopstudent1

You can create by command: hadoop fs -mkdir /user/data/sqoopstudent1

[to learn about how to create directory, read: How to create a directory in Hadoop File System? ]

4. Now Let’s start the Hadoop Command Line (can you see the Icon on the Desktop? Yes? Great! Open that!)

5. Navigate to: c:Hadoopsqoop-1.4.2bin>

*This path may change in future, but navigate to the bin folder under the SQOOP_HOME.

6. Run dir command to see various files under this directory.

sqoop list files under the HOMe directory import export

Also you can run sqoop help for more information on the command that we are about to run.

sqoop list of commands help

7. Now here’s the command to Load data from SQL Server to HDFS:

c:Hadoopsqoop-1.4.2bin>sqoop import –connect “jdbc:sqlserver://localhost;dat
abase=UniversityDB;username=sqoop;password=**********” –table student –tar
get-dir /user/data/sqoopstudent1 -m 1

sqoop command to load data from sql server to hadoop file system

8. After successfully running the above command, let’s browse the file in HDFS!

sqoop see the content of the file

That’s about it for this post!


Thanks Aviad Ezra who answered my question on this MSDN thread: An error while trying to use Sqoop on HDInsight to import data from SQL server to HDFS


In this post, we saw how to load data into Hadoop from SQL Server using Sqoop (SQL Hadoop)

Related Articles:

How to Load Twitter data into Hadoop on Azure cluster and then analyze it via Hive add-in for excel?


In this blog post, we would:

1. Upload Twitter Text Data into Hadoop on Azure cluster

2. Create a Hive Table and load the data uploaded in step 1 to the Hive Table

3. Analyze data in Hive via Excel Add-in

Before we begin, I assume you have access to Hadoop on azure, Have your sample data (don’t have one? learn from a blog post), familiar with Hadoop ecosystem and know your way around the Hadoop on Azure Dashboard.

Now, Here are the steps involved:

STEP 1: Upload Twitter Text Data into Hadoop on Azure cluster

1. Have your data to be uploaded ready! I am just going to Copy Paste the File from my host machine to the RDP’ed machine. In this case, the machine that I am going is the Hadoop on Azure cluster.

For the purpose of this blog post, I have a text file having 1500 tweets:

upload twitter text data to hadoop on azure

2. Open web browser > Go to your cluster in Hadoop on Azure

3. RDP into your Hadoop on Azure cluster

Remote Desktop into Hadoop on Azure cluster

4. Copy-Paste the File. It’s a small data file so this approach works for now.

uploading twitter text data to hadoop on azure hdfs cluster

Step 2: Create a Hive Table and load the data uploaded in step 1 to the Hive Table

1. Stay on the machine that you Remote Desktop (RDP’ed) into.

2. Open the Hadoop command line (you’ll see a icon on your Desktop)

3. switch to Hive:

write hive commands in hadoop on azure

4. Use the following Hive Commands:


CREATE TABLE TweetSampleTable (
id string,
text string,
favorited string,
replyToSN string,
created string,
truncated string,
replyToSID string,
replyToUID string,
statusSource string,
screenName string

LOAD DATA LOCAL INPATH ‘C:appsdistexamplesdatatweets.txt’ OVERWRITE INTO TABLE TweetSampleTable;

Note that for the purpose of this blog-post, I’ve chose string as data type for all fields. This is something that depends on the data that you have. If I were building a solution, I would spend some more time choosing the right data type.

Step 3. Analyze data in Hive via Excel Add-in

1. Switch to Hadoop on Azure Dashboard

2. Go to the Hive Console and run the show tables to verify that there is a tweetsampletable.

show all tables in hive hadoop on azure

3. Now if you haven’t, Download and Install the Hive ODBC Driver from the Downloads section of your Hadoop on Azure Dashboard.

4. I setup  a ODBC connection to Hive by following the instructions here: How To Connect Excel to Hadoop on Azure via HiveODBC (en-US)

5. After that, Open Excel. I have Excel 2010 64 bits.

6. Switch to Data Tab > Hive Pane

7. Choose the Hive connection > select Table > Select Columns > And off you go!

you have Hive Data in Excel!

Hadoop on azure Hive Excel addin

Now go Analyze!


In this blog-post, we saw How to Load Twitter data into Hadoop on Azure cluster and then analyze it via Hive add-in for excel?