Presentation Tip: Change the Font Size of the Windows Command Prompt

Standard

I have researched about Presentation Tips over past few weeks, one Tip that I read again and again is that “Make Fonts larger for readability”. To that end, I just changed the font size of the command line prompt so that when I am presenting, the audience is able to see what I am typing. So If you’ve have to present to audience something via command prompt, this should be of helps:

1. Open command prompt > Right click near the Title bar > select properties

command prompt change font size

2. switch to font tab > select the font and the size. you can also change the color, layout among other things here.

command prompt change font size font color layout

3. see how it looks:

font size changed command prompt

Conclusion:

In this post, we saw how to change the Font size of the windows command prompt.

your comments are very welcome!

Back to basics: Data Mining and Knowledge Discovery Process

Standard

Once in a while I go back to basics to revisit some of the fundamental technology concepts that I’ve learned over past few years. Today, I want to revisit Data Mining and Knowledge Discovery Process:

Here are the steps:

1) Raw Data

2) Data Pre processing (cleaning, sampling, transformation, integration etc)

3) Modeling (Building a Data Mining Model)

4) Testing the Model a.k.a assessing the Model

5) Knowledge Discovery

Here is the visualization:

knowledge discovery process data miningAdditional Note:

In the world of Data Mining and Knowledge discovery, we’re looking for a specific type of intelligence from the data which is Patterns. This is important because patterns tend to repeat and so if we find patterns from our data, we can predict/forecast that such things can happen in future.

Conclusion:

In this blog post, we saw the Knowledge Discovery and Data Mining process.

Quick Post: Uploading Local Data to Hadoop file system using Hadoop Command Line

Standard

This is a Quick Post, Just want to share a command to upload local data to HDFS using Hadoop Command Line.

The command looks like:

> hadoop fs -copyFromLocal input.txt input/SqrtJob/input.txt

1

Examples of Machine Generated Data from “Big Data” perspective:

Standard

I just researched about Machine Generated Data from the context of “Big data”, Here’s the list I compiled:

– Data sent from Satellites

– Temperature sensing devices

– Flood Detection/Sensing devices

– web logs

– location data

– Data collected by Toll sensors (context: Road Toll)

– Phone call records

– Financial

And a Futuristic one:

Imagine sensors on human bodies that continuously “monitor” health. How about if we use them to detect diabetes/cancer/other-diseases in their early phases. Possible? May be!

Interesting Fact:

Machine can generate data “faster” than humans. This characteristics makes it interesting to think about to analyze machine generate data and in some cases, how to analyze them in real-time or near real-time

Ending Note:

Search for Machine Generated Data, you’ll be able to find much more, it’s worth reading about from the context of Big Data.

Thanks:

http://www.dbms2.com/2010/04/08/machine-generated-data-example/

http://en.wikipedia.org/wiki/Machine-generated_data

http://tdwi.org/articles/2012/10/23/machine-generated-big-data.aspx

Download PPT: Why Big Data Matters?

Standard

Download Link Here:

SQL Saturday 185 (Trinidad): Why Big Data Matters? by Paras Doshi

(if you need the .ppt version of this talk, please contact me via http://parasdoshi.com/contact/)

 

How to Install Microsoft .Net SDK for Hadoop?

Standard

There are two main steps:

1. Installing Nuget Package manager if you haven’t already.

2. Installing Microsoft .Net SDK for Hadoop

Installing Nuget Package manager

1) Open Visual Studio

2) Tools Menu > Extensions Manager > Search online gallery > Nuget

3) Downloaded and Installed Nuget:

Nuget Package Manager Extensions Manager

4. Restarted Visual Studio

Installing Microsoft .NET SDK for Hadoop

1. Tools menu > Library Package Manager > Package Manager console

2. Installed Map/Reduce, Linq to Hive and WebHDFS component by running following commands in the package manager prompt:

Example for:

install-package Microsoft.Hadoop.MapReduce -pre

Nuget Microsoft SDK for Hadoop install mapreduce

Conclusion:

In this post, we saw how to install Microsoft .NET SDK for Hadoop.

Resource:

Continue learning: Programming MapReduce Jobs with HDInsight Server for Windows

inner workings of HDFS and MapReduce in a nutshell:

Standard

HDFS and MapReduce inner workings in a nutshell.

HDFS MapReduce inner workings

Click on the image to view larger sized image

 

Data visualization: Cost of Hard Drive storage space

Standard

Here are the visualization:

1982 – 2009:

1982 2009 storage cost

2000 – 2008

2000 2008 storage cost

I grabbed data from: http://www.mkomo.com/cost-per-gigabyte And http://ns1758.ca/winch/winchest.html – Thanks!

Conclusion

Storage cost has drastically decreased. Mathematically, Storage cost has decreased exponentially. No wonder we can store lot’s of data for few dollars and no wonder that the age of Big Data has already arrived!

How to start Analyzing Twitter Data w/ R?

Standard

Over the past few weeks, I have posted notes about Analyzing Twitter Data w/ R, listing them here:

1. Install R & RStudio

2. R code to download twitter data

3. Perform Sentiment Analysis on Twitter Data (in R)

How to load some data to Hadoop on Windows to get started?

Standard

In this post, I want to point out that HDInsight (Hadoop on Windows) comes with a sample datasets (log files) that you can load using the command:

1. Hadoop command Line > Navigate to c:HadoopGettingStarted

2. Execute the following command:

powershell -ExecutionPolicy unrestricted –F importdata.ps1 w3c

import data to hadoop on windows file system

After you have successfully executed the command, you can sample files in /w3c/input folder:

w3c log files iis hadoop on windows

Conclusion: In this post, we saw how to load some data to Hadoop on Windows file system to get started. Your comments are very welcome.

Official Resource: http://gettingstarted.hadooponazure.com/loadingData.html