Get started on Windows Azure: Attend “Meet Windows Azure” event Online

Standard

On June 7th 2012 – there’s an online event called “Meet Windows Azure” where Scott Gu and his Windows Azure team would introduce the Windows Azure platform. You can register here: http://register.meetwindowsazure.com/

If you’re planning to attend – there’s a very interesting tweet-up planned called “Social meet up on Twitter for MEET Windows Azure on June 7th” – All you have to do is follow #MeetAzure, #WindowsAzure on Twitter & Interact! Simple!

There’s an unofficial blog relay, if you write a post – Tweet it to @noopman – Here is the Blog Relay:

Want to learn about BigData? read Oreilly’s Book “Planning for BigData”

Standard

Recently at SQLRally, Rushabh Mehta sir pointed me to a great BigData resource. It was a great read and thought it could help you too. So, If you want to get started on the world of BigData, then there’s a resource that could be of help:

Oreilly’s Book: Planning for BigData

It’s a short book (80 pages) and it’s free too. This book would help you get acquainted with following BigData Topics*:

1. What is BigData?
2. What is Apache Hadoop?
3. Brief description of companies in BigData space
4. Microsoft’s plan for BigData
5. BigData in cloud
6. Data Market Places (like Azure DataMarket)
7. NoSQL World
8. Visualization
9. Future of BigData.

(*Sourced from the Index of the Book)

That’s about it. Get your hands on this short and free book, if you are interested to learn about BigData

Related Article: Examples to help clarify what’s unstructured data and what’s structured?

And Let’s connect! I Look forward to Interacting with you on any of these people networks:

paras doshi blog on facebookparas doshi twitterparas doshi google plus

Examples to help clarify what’s unstructured data and what’s structured?

Standard

I have been reading and researching about BigData and BigData on cloud. One of the concept that’s repeated is that “Big Data is about analyzing unstructured data…” and in this blog post, I just want to show few examples that would help you differentiate between Structured data & Unstructured data.

Before we begin, here’s the definition of Unstructured data:

Unstructured Data (or unstructured information) refers to information that either does not have a pre-defined data model and/or does not fit well into relational tables – Wikipedia

Also I just wanted to point that it’s not unstructured because you cannot fit the data into a schema/model but even after fitting it into the model – it would not help. Example. Consider email body as an example of unstructured data. You can create a column “EMAIL BODY”. Now think of questions that are likely to be asked. Do they get answered? if not – then fitting it into model and calling it structured does not make sense, does it? With that, Here are the examples:

1 Word Doc & PDF’s & Text files

Unstructured data

Examples: Books, Articles

2. Audio files

Unstructured data

Example: Call center conversations.

3. email body

Unstructured data

Example: you don’t need an example here!

4. Videos

Unstructured data

Example: Video footage of criminal interrogation

5. A Data Mart / Data Warehouse

Structured Data

6. XML

Semi Structured Data

Couple of Applications for your brain cells:

1. Map disease patterns by analyzing medical records (Text)

2. Tuning customer support by analyzing calls (Audio)

Few Quotes about Unstructured data that I liked:

80 percent of business-relevant information originates in unstructured form –  Justin Langseth. URL (Wikipedia Article says that even Merrill Lynch cited this)

BUT some-one else had a nice perspective about this 80%:

but managing it (this 80%) really isn’t a significant problem……………the innovation isn’t in structuring text, it’s in applying models to discover and exploit their inherent structure. Source

My Experience with Unstructured Data (in context of BigData) and Cloud:

I have been playing with MapReduce on Windows Azure (Project Daytona), Elastic Map reduce (Amazon Web Services) and Google’s BigQuery platform. To give you one example. I’ll use the example of Microsoft’s project daytona. Here I uploaded data in unstructured format in form of TEXT. And the goal was to run the “Word Count”. It helps you answer questions like: which word has the highest frequency? or which is the least popular word? and you could tweak the algorithm to consider words with length greater than four (among other constraints) – Now this is what happens when you run the algo: amazing MapReduce framework (App deployed on Windows Azure in this case) does some analysis on unstructured data (TEXT  in this case) and it helps you answer the question that you were looking for. So I hope you know how it works.

That’s about it for this post. Do you have an example or application of unstructured data? Please do post it in the comments!

Played with Microsoft research “Project Daytona” – MapReduce on Windows Azure

Standard

Recently, I played with Project Daytona which is a MapReduce on Windows Azure.

It seems like a great “Data Analytic’s as a service”. I tried the k-means and the word-count sample application that comes bundled with the project run-time download: http://parasdoshi.visibli.com/share/z14Ty2

The documentation along with the project guides you in a step by step fashion on how to go about setting up the environment but for those who are curious, here is a brief description on how I setup the environment:

1) Uploaded the sample data-sets to Azure Storage

2) Edited the configuration file (ServiceConfiguration.cscfg) to point to correct Azure Storage

3) Chose the Instance size and the no. of Instances for the deployment

4) Deployed the binaries to Windows Azure (.cspkg and .cscfg)

5) Ran the Word Count Sample

6) Ran the K-means Sample

Conclusion: It was pretty amazing to run MapReduce on Windows Azure. If you are into BigData, MapReduce, Data Analytic’s – then check out “Project Daytona”

That’s about for this post. And what do you think about Project Daytona – MapReduce on Windows Azure?

One more way to run SQL Server on cloud: SQL server on AWS RDS

Standard

Up until April 2012, the only way to run SQL server on cloud was “SQL Azure”. But recently AWS announced SQL Server on Cloud. Good news? Probably. it’s always good to have more than one option. So for those who are new to world of AWS, here are few tips before you get hands-on:

1) The way RDS works is that you spin up “db instances”. So here you specify the machine size that would “power” your database. And remember that the type of instance you choose would directly affect your bill.

2) Spend some time understanding the billing structure. Since AWS gives you lot of options – their billing structure is not simple. Don’t get me wrong, I am not saying that lot of options in AWS is bad. it’s just that the billing is not simple and it’s not one-dimensional (there are various dimensions that shapes your billing structure). And why should you invest time? because in the “pay – as – you – go ” model it would directly affect your Bill.

3) understand costs like: cost to back-up database PLUS data-transfer cost.

4) Understand the difference between “Bring your OWN license” and “license included” (Express, Standard and web only. Currently enterprise edition not included here) model in RDS SQL Server

5) and unlike SQL Azure, RDS SQL Server charges on a “per hour” basis.

Note the date of this post: 15th may 2012. Things change very fast, so readers-from-the-future please refer to official documents.

BTW, here are the few blog posts from the web-o-sphere:

1. Expanding the Cloud for Windows Developers

2. First Look – SQL Server on Amazon Web Services RDS

3. Official resource: AWS RDS SQL Server

That’s about it for this post.

How do you reduce the network “latency” between application and SQL Azure?

Standard

I was at SQL Rally recently (10-11 may 2012) where I happened to have a nice talk about SQL Azure with a fellow attendee. They were considering porting their database (that supports one of their apps) to Microsoft’s cloud service. One of the concern they had was “How to reduce the network latency between SQL Azure and their App?” And Since I knew the solution, I shared it with that person. I am sharing it here so others can benefit too.

Now one of the first question that I asked the attendee was: Are you also porting your app along with the database to Azure?

Turns out, They were considering to host the app on Azure cloud too. So technically that’s called a “Code Near” scenario – And in this case, the application and the database both *should* reside in the same data-center. if you do so, the network latency between your app and the database is minimal.

Now, if you have your app on-premise and you are considering SQL Azure, then select the data-center location that has the minimal network latency between your app and SQL azure. Technically it’s called Code-Far scenario I have written about one of the ways you can do so, here’s the URL: Testing latency between client and SQL Azure via client statistics in SSMS

That’s about it for this post.

Official Resource: SQL Azure and Data access

Business operation challenges faced by SaaS provider & cloud provider

Standard

I just researched few business operation challenges that are faced by SaaS (software as a service) provider & cloud provider, I am sharing what I found here:

  1. SaaS and cloud computing companies have wafer thin margins (a.k.a lower margin businesses). According to SaaS update (April 2011 2008, pg 15), Their operating margins are just about 3%. You can do quick search to know past operating margins for the SaaS and cloud provider of your choice.
  2. SaaS and cloud computing companies have to manage the customer churn rate which means these companies have to put efforts in retaining customers. Churn rate is a measure of the number of individuals moving into or out of a collective over a specific period of time. Now managing churn rate is a challenge because retaining customers is difficult and time-consuming. And customer retention is important because in the pay-as-you-go nature of subscription businesses, the customers pay only if they continue to use the service. Thus managing customer churn rate is a challenge for SaaS provider and cloud provider.
  3. Cloud provider and SaaS provider needs an upfront investment to build sales capacity needed to grow the business over time. And so if they invest less, they do not have enough sales capacity and hence they will miss growth opportunities. Thus they have to make a careful trade-off between fast growth and high cash burn rate.