In this post, we’ll see how to use Sqoop to load data into HDFS from SQL Server?
With that, here are the steps:
1. You have the Microsoft® HDInsight Preview for Windows Installed on your machine. Here’s a tutorial: Installing HDInsight (Microsoft’s Hadoop) on windows 7
2. Make sure that the Cluster is up & running! To check this, I click on the “Microsoft HDInsight Dashboard” or open http://localhost:8085/ on my machine
Did you get any “wait for cluster to start..” message? No? Great! Hopefully, all your services are working perfectly and you are good to go now!
3. Before we begin, decide on three things:
3a: Username and Password that Sqoop would use to login to the SQL Server database. If you create a new username and pasword, test it via SSMS before you proceed.
3b. select the table that you want to load into HDFS
In my case, it’s this table:
You can create by command: hadoop fs -mkdir /user/data/sqoopstudent1
[to learn about how to create directory, read: How to create a directory in Hadoop File System? ]
4. Now Let’s start the Hadoop Command Line (can you see the Icon on the Desktop? Yes? Great! Open that!)
5. Navigate to: c:Hadoopsqoop-1.4.2bin>
*This path may change in future, but navigate to the bin folder under the SQOOP_HOME.
6. Run dir command to see various files under this directory.
Also you can run sqoop help for more information on the command that we are about to run.
7. Now here’s the command to Load data from SQL Server to HDFS:
c:Hadoopsqoop-1.4.2bin>sqoop import –connect “jdbc:sqlserver://localhost;dat
abase=UniversityDB;username=sqoop;password=**********” –table student –tar
get-dir /user/data/sqoopstudent1 -m 1
8. After successfully running the above command, let’s browse the file in HDFS!
That’s about it for this post!
Thanks Aviad Ezra who answered my question on this MSDN thread: An error while trying to use Sqoop on HDInsight to import data from SQL server to HDFS
In this post, we saw how to load data into Hadoop from SQL Server using Sqoop (SQL Hadoop)
- How to Load Twitter data into Hadoop on Azure cluster and then analyze it via Hive add-in for excel?
- Visualizing MapReduce Algorithm with WordCount Example:
- End to End Demo: Hadoop (HDInsight) + Hive + Excel + Power View + Azure Data Market
- Microsoft® HDInsight Preview for Windows: How to create a directory in Hadoop File System?