Download PPT: Why Big Data Matters?

Standard

Download Link Here:

SQL Saturday 185 (Trinidad): Why Big Data Matters? by Paras Doshi

(if you need the .ppt version of this talk, please contact me via http://parasdoshi.com/contact/)

 

How to start Analyzing Twitter Data w/ R?

Standard

Over the past few weeks, I have posted notes about Analyzing Twitter Data w/ R, listing them here:

1. Install R & RStudio

2. R code to download twitter data

3. Perform Sentiment Analysis on Twitter Data (in R)

How to load some data to Hadoop on Windows to get started?

Standard

In this post, I want to point out that HDInsight (Hadoop on Windows) comes with a sample datasets (log files) that you can load using the command:

1. Hadoop command Line > Navigate to c:HadoopGettingStarted

2. Execute the following command:

powershell -ExecutionPolicy unrestricted –F importdata.ps1 w3c

import data to hadoop on windows file system

After you have successfully executed the command, you can sample files in /w3c/input folder:

w3c log files iis hadoop on windows

Conclusion: In this post, we saw how to load some data to Hadoop on Windows file system to get started. Your comments are very welcome.

Official Resource: http://gettingstarted.hadooponazure.com/loadingData.html

Microsoft® HDInsight Preview for Windows: How to use Sqoop to load data into HDFS from SQL Server?

Standard

In this post, we’ll see how to use Sqoop to load data into HDFS from SQL Server?

With that, here are the steps:

1. You have the Microsoft® HDInsight Preview for Windows Installed on your machine. Here’s a tutorial: Installing HDInsight (Microsoft’s Hadoop) on windows 7

2. Make sure that the Cluster is up & running! To check this, I click on the “Microsoft HDInsight Dashboard” or open http://localhost:8085/ on my machine

Did you get any “wait for cluster to start..” message? No? Great! Hopefully, all your services are working perfectly and you are good to go now!

3. Before we begin, decide on three things:

3a: Username and Password that Sqoop would use to login to the SQL Server database. If you create a new username and pasword, test it via SSMS before you proceed.

3b. select the table that you want to load into HDFS

In my case, it’s this table:

sql table to be loaded into hadoop hdfs from sql server3c: The target directory in HDFS. in my case I want it to be /user/data/sqoopstudent1

You can create by command: hadoop fs -mkdir /user/data/sqoopstudent1

[to learn about how to create directory, read: How to create a directory in Hadoop File System? ]

4. Now Let’s start the Hadoop Command Line (can you see the Icon on the Desktop? Yes? Great! Open that!)

5. Navigate to: c:Hadoopsqoop-1.4.2bin>

*This path may change in future, but navigate to the bin folder under the SQOOP_HOME.

6. Run dir command to see various files under this directory.

sqoop list files under the HOMe directory import export

Also you can run sqoop help for more information on the command that we are about to run.

sqoop list of commands help

7. Now here’s the command to Load data from SQL Server to HDFS:

c:Hadoopsqoop-1.4.2bin>sqoop import –connect “jdbc:sqlserver://localhost;dat
abase=UniversityDB;username=sqoop;password=**********” –table student –tar
get-dir /user/data/sqoopstudent1 -m 1

sqoop command to load data from sql server to hadoop file system

8. After successfully running the above command, let’s browse the file in HDFS!

sqoop see the content of the file

That’s about it for this post!

Thanks

Thanks Aviad Ezra who answered my question on this MSDN thread: An error while trying to use Sqoop on HDInsight to import data from SQL server to HDFS

Conclusion:

In this post, we saw how to load data into Hadoop from SQL Server using Sqoop (SQL Hadoop)

Related Articles:

Neologism is the new challenge for IT professionals, Here’s why:

Standard

What is Neologism?

Neologism means The coining or use of new words – And I believe it’s one of the challenge faced by IT professionals. Nowadays, we put our time & energy trying to get head around “new terms/words/trends”.

Let’s take couple of example(s):

Sometime back, we had cloud computing. Nowadays, its Big Data; In my mind – Big Data has been coined to mean following technologies/techniques under different contexts:

Big Data Unstrucutred External Text Public Data

Note: The above image is just for illustration purpose. It does not comprehensively cover every technology that is now called “Big Data”. Feel free to point it out if you think I missed something important.

And Neologism is challenge because:

1) Generally, it’s a new trend and there is little to no consensus on what does it “Exactly” mean

2) It means different things in different context

3) Every person can have their own “interpretation” and no one is wrong.

4) It’s a moving ball. The definition used today will change in future. So we always need a “working” definition for these terms.

Now, Don’t get me wrong, It’s fun trying to figure out what does it all mean and trying to gauge whether it matters to me and my organization or not! What do you think – as a Person in Information Technology, do you think that Neologism is one of the challenges faced by us? consider leaving a reply in the comment section!

Related Articles:

Want to learn about BigData? read Oreilly’s Book “Planning for BigData”

Quote for Big-Data / Data-Science/ Data-Analysis enthusiasts:

Who on earth is creating “Big data”?

Examples to help clarify what’s unstructured data and what’s structured?

Things I shared on Social Media Networks during Noc 12 – Dec 31 (2012)

Standard

Big Data: The Coming Sensor Data Driven Productivity Revolution http://bit.ly/TQAPsW

Check out some nice getting started tutorials at beyondrelational site: http://bit.ly/RVVHRV

Complexity is your enemy. Any fool can make something complicated. It is hard to make something simple – Richard Branson

— via Paras Doshi – Blog http://on.fb.me/WAQ5ky

The success of companies like Google, Facebook, Amazon, and Netflix, not to mention Wall Street firms and industries from manufacturing to retail and healthcare, is increasingly driven by better tools for extracting meaning from very large quantities of data,” says Tim O’Reilly

— via Paras Doshi – Blog http://on.fb.me/WAQ5ky

Nice collection of about 20+ videos around the topic of “Data Science”: http://bit.ly/WMkZqc

Nice collection of videos by Berkeley school of information: http://bit.ly/Tf1yAD #Information #Data

Just found Facebook’s data team’s page: http://on.fb.me/ToYILO

via V Talk Tech – A Parth Acharya Blog – Nice HeatMap of stocks! http://on.fb.me/SfBbvF

what’s the biggest fear about cloud computing? via Windows Azure http://on.fb.me/VjIiHR

Resource: Presentations from the Sentiment Analysis Symposium http://bit.ly/VtPH3B

If I switched to the newest “holiday” theme on WordPress, this is how it would look: http://on.fb.me/UEuyFr

Nice! Code School now has R programming language! I have been playing with R for a while now and definitely want to learn more – here’s the link to learn R: http://bit.ly/VEAnkZ

Interesting tool from Google to optimize and analyze web page speeds: http://bit.ly/HTubNC

Performed #sentiment #Analysis on #starbucks twitter data using #R ! It was fun! http://on.fb.me/Z3qLo8

In 2002: The Data Warehousing Institute estimates that data quality problems cost U.S. businesses more than $600 billion a year. And of course, over the past 10 years, this number would be bigger. http://bit.ly/TPT9r3

Reading: Business Analytics vs Business Intelligence? http://bit.ly/YUtJwx

Big data is a nickname for the recent increase in largely external and unstructured business and consumer information. How are businesses across industries harnessing traditional enterprise information management functions and systems to translate big data into useful business intelligence? http://www.deloitte.com/view/en_US/us/Services/additional-services/deloitte-analytics-service/217c19e69249b310VgnVCM2000003356f70aRCRD.htm

For business analytics professionals: 12 webcasts on Jan 30th 2013 http://bit.ly/RUFsZ3 #sqlpass #analytics #24hop

Some nice insights about how to build an Internet platform, from the founder of Zipcar: http://bit.ly/Yco6IP

Let’s connect and converse on any of these people networks!

paras doshi blog on facebookparas doshi twitter paras doshi google plus paras doshi linkedin

See what went into building WATSON, an advanced machine learning & natural language processing system powered by Big Data!

Standard

Do you know about Jeopardy! quiz show where a computer named Watson was able to beat world champions? No! Go watch it! Yes? Nice! Isn’t it a feat as grand as the one achieved by Deep blue (chess computer); if not less?

I am always interested in how such advanced computers was built. In case of Watson, It’s fascinating how technologies such as Natural language processing, machine learning & artificial intelligence backed by massive compute & storage power was able to beat two human world champions. And as a person interested in analytic’s and Big Data – I would classify this technology under Big Data and Advanced Data Analytics where computer analyzes lots of data to answer a question asked in a natural language. It also uses advanced machine learning algorithms. To that end, If you’re interested in getting an overview of what went into building WATSON, watch this:

If you’re as amazed as I am, considering sharing what amazed you about this technology via comment section: