News – Now a Certified Cloud Guy – Completed University of Washington’s cloud certifcate requirements!

Standard

Recently I completed a cloud computing course taught at University of Washington and so now I am a certified Cloud guy but more importantly It was great learning experience!

More about the certificate:

The course had three courses which covered following topics:

  • cloud computing fundamentals
  • cloud computing models
  • cloud computing case studies
  • cloud computing application building
  • operations in cloud
  • scalable computing
  • MapReduce
  • NoSQL
  • Big Data
  • programming Big Data
  • Database as a Service

Thanks University of Washington and Instructors for a great learning experience!

Related Articles:

SolidQ Journal: Building Ideal PowerPivot Model for Power View reports

Standard

My journal article titled “Building Ideal PowerPivot Model for Power View reports” got published in SolidQ Journal. In this article, I talk about reporting properties in PowerPivot Model that you can set which will enhance the Power View report creation experience of your end-users. Here are the five main topics discussed in the article:

– Hide from Client Tools
– ImageURL
– Default Field Set
– Table Behavior
– Calculate Columns and Calculate measures

In the Part 2 of this series, we will discuss the reporting properties in Tabular Model to help you build an ideal Model for Power View reports. I’ll let you know when that’s published Part 2 is Published: http://parasdoshi.com/2012/09/25/new-journal-article-published-title-building-an-ideal-tabular-model-for-power-view-reports/

I would also like to thank the SolidQ Journal Team and Ruben Lopez who was the Technical reviewer of the article.

And if you have any feedback, please drop a comment or contact me. Thank you

Top Five reasons to contribute on MSDN or StackOverflow forums

Standard

Technical forums are places where you can sense that there’s a hope for humanity! No kidding. If you think about it, it’s a place where humans help each other out without expecting anything in return (in almost all cases), what I just said is a fact. So now if you agree that forums are a great place, How about contributing? So if you’re not contributing already, here are the five reasons that may prompt you to start contributing:

1. Help someone out.

2. Solve a real-world problem

[Thanks Florin Dumitrescu for pointing this out!]

3. Discover great resources on Inter-webs. This is so because, while answering questions – people drop awesome links in their answers and invariably they are great free resources:

[Thanks Hardik Pandya for pointing this out!]

4. Learn from other super-smart people and Network with them

5. Test (or “Validate”) your technical know-how

And One more:

Earn reputation (Build your Brand, they say)

What do you think? If you are already contributing on technical forums – what motivates you? And if you’re not contributing already – what’s stopping you? And I believe that everyone knows about tons of things which others are interested in. You just need to find a place to share that!

How to Disable password expiration for Windows Server 2008 R2 (domain controller)?

Standard

I have written about how to disable password expiration for Windows Server 2008 R2 if it is NOT a domain controller. You can Find that post here: http://parasdoshi.com/2012/04/19/how-to-disable-the-password-expiration-policy-in-windows-server-2008-r2-demo-machine/

Now, if you are looking to disable the password for the Windows Server 2008 R2 dev. machine which is also a Domain Controller then follow these steps:

1) If you go to “Local security policy- you’ll see the options but it is not going to allow you to change the setting even if you are logged in as domain administrator.

windows server 2008 r2 disable password expiration local security policy

2. So we need an alternate path to edit the password expiration policy.

Go to Start > Administrative Tools > Group Policy Management

3. Here click on “edit” for the default domain policy for the domain of your choice:

windows server 2008 r2 disable password expiration group policy management

4. Go To Policies > Windows Settings > Security Settings > Account Policies > Password Policy

windows server 2008 r2 disable password expiration group policy management editor

5. Change the Password Policy!

Note that changing your password policy to disable password expiration is a security vulnerability. It’s applicable for your Demo Machine only. Or your Dev Machine. The reason I am documenting it that I do not want to change the password of Windows Server on which I have my Sharepoint BI dev environment Setup. It’s MY Dev Environment and I am NOT sharing it with other folks PLUS I do not anything sensitive on it, So I can afford disabling the password expiration policy.

That’s about it for this post. Happy Tweaking!

Addressing few Q’s a reader had about Google’s BigData offering BigQuery:

Standard

After reading First Impression: Google’s BigData offering called BigQuery , a reader (Shadab Shah) had few questions about it and in this blog-post, I am going to address those questions:

Q1. Any browser based Tool’s to Query data in BigQuery?

A1: They have a Browser Based Tool which they call “BigQuery Browser Tool” using which you can Query Data.

Apart from browser based, there are other tools too:

1) a command line tool called “BQ command-line tool. You can find more information here: https://developers.google.com/bigquery/

2) API. one can “include” big data analytic capabilities into a web app via RESTFul API. (Point #2 content credit: Michael Manoochehri’s comment)

Q2) Where is the Data Stored? 

If i just say “Google Cloud” that would not be a complete answer. There’s a complementary service called “Google CLOUD SQL” and so I do not want you to confuse data stored for BigQuery with “Google cloud SQL”.Theres’ a difference between BigQuery and Google cloud SQL, you can read that here: https://developers.google.com/bigquery/docs/overview

Having said that, it’s stored on Google’s cloud and if you wish to use BigQuery – you’ll have to upload your data-set in a CSV format and if you do so, it’s stored in Google cloud and is ready to be analyzed via BigQuery.

Q3) Where do I find lots of data to play with BigQuery?

Google has few sample data-sets that you can play with:

bigquery sample data

That’s about it for this post. Thanks Shadab Shah for the questions, I hope this post is useful.

This is cool: Microsoft project codename “Social Analytics”

Standard

Microsoft project codename “Social Analytics” is one nice beta project! Quoting from it’s site

“it is aimed at developers who want to integrate social web information into business applications”

But the KEY here is that it allows you to integrate FILTERED social web information into your business applications. Today, you could go ahead – grab a twitter stream data – embed it in your application but guess what? In most cases, it’s too much information. Too much information means, that it’s very difficult for business-folks to take actions by analyzing these truckloads of information. And so:

even though we are data-rich – we are information (insight) poor.

My point being, that tons of information PRODUCED by [customers, partners, critics, employees..] GATHERED from [Twitter, Facebook, Linkedin, Blogs…] is NOT useful in it’s raw form. To take actions based on all these data-points – what we need is a way to categorize data (filter data) which would help the decision maker in seeing only SMALL part of data-set he/she needs for performing that particular analysis.

Let’s take an example:

A business-decision-maker wants to see “All twitter-users who have posted positive reviews about Windows 8 Design and User Experience”

How would you solve it?

Think.

Imagine.

Easy? hard?

Thinking of writing your own Sentiment Analyzer? awesome, & Good Luck!

Any-who, may be you know it’s not straight-forward to answer the above question using raw twitter data.

But here’s the thing you could use Third-party tools to solve the problem. Don’t get me wrong, I am not asking you to ignore them. But here’s how Microsoft Social Analytics helped me solve the above problem:

social analysis windows 8 design and user experience

Here’s how I FILTERED the data: (It’s a thing called Social Analytic Engagement client) microsoft social analytics filter data enagagement client

And as you can see there are more than one ways you can slice/filter your data to provide a view that is best suited for a particular analysis assignment.

Please note:

1. Currently in beta: only two data-sets are available i.e. Bill Gates & Windows 8.

2. Apart from this nicely designed web based engagement tool, you can integrate the information into applications using Social Analytics API.

Conclusion:

Check out Microsoft Codename “Social Analytics”! Today, it’s here: http://www.microsoft.com/en-us/sqlazurelabs/labs/socialanalytics.aspx (you’ll need an invite to try out social analytics.

First Impression: Google’s BigData offering called BigQuery

Standard

 

As a part of University of Washington’s (UW) cloud class’s assignment, I played with Google’s BigData offering BigQuery and I am writing this blog post to share what I think about it. please note that the views are my own and do not represent those of the instructor’s and fellow students at UW. And also I am not a BigData “Expert”, Think of me as a student trying to get my head around various offerings out there – So if you feel otherwise about what I have written, Just let me know in the comments section. Any-who read along to know what I think of BigQuery:

First up what is BigQuery?

It’s a platform to analyze your data (lot’s of it) by running SQL-Like Queries. And it’s really SQL-Like, and so if you are from SQL world like me – you would not face any issues in getting up and running in seconds by referring to the nicely written documentation.

And other point to consider here is that even though it’s SQL-Like, you’ll be able to analyze considerable number of rows in few seconds. Let me give you an example: I played with a  sample (called gsod) which had 115M rows and as per my experiments, I was able to get answers to simple computations like max, mean, avg, etc in less than couple of seconds. And little complex queries having where, joins and group by in around 5-6 seconds. Your results may vary depending on the type of query you run but the BOTTOMLINE is that it is FAST. that’s a good news!

BigQuery is Fast!

But what bothers me is that How am I suppose to “UPLOAD” lots of data on the Google CLOUD. It takes time, right? But I guess that’s an issue with every cloud based BigData offering. But here’s what I am thinking – If your data is already on the cloud. for e.g. Amazon’s or Microsoft’s – Does it not make sense to run analytic’s on Amazon’s and Microsoft’s cloud instead of porting your data to Google’s?

[Sidenote: I like it that Hadoop on Azure allows Amazon S3 data source. Nice move!]

My concern: Time spent in uploading truckload of data to Google’s cloud just so that we can use it for BigQuery

And even if you have your data on GAE data-store, you’ll have to uplaod your data to BigQuery separately. Source

Zooming out for a moment, I feel the Goal of BigQuery was to offer an easy to use BigData platform, And I feel that’s what they have delivered:

An easy-to-use + easy-to-setup “Hadoop+Hive” Like Offering.

[Update: Aug 20th 2012: I have been thinking about it more and I realized that BigQuery is more about satisfying real-time Big Data Scenario’s. And Hadoop/Hive/MapReduce is more about Batch Oriented  analysis and it’s great if you need to pre-process tons and tons of data]

But this “easiness” means that It is NOT as advanced as a Hadoop Installation (or Hadoop-on-Azure or Amazon’s elastic-map-reduce). But again, it’s easier and faster to get started with BigQuery. I guess, it just depends on what you are trying to achieve and based on that you’ll have to figure which is right tool for your scenario. No generic answer here, Sorry!

And BTW BigQuery supports only CSV – Talk about Variability (One of the V’s of BigData!). Let’s not get into that. I just wanted to Point that out because if you’re looking to analyze data-sets that cannot be converted to CSV for running SQL-Like Queries on top of them then BigQuery is not for you.

Conclusion:

Try out BigQuery. It’s easy to get started. It’s powerful if SQL-Like queries are all what you’ll need to analyze your data. If you are BigData enthusiast/expert/student – It’ll be a nice exercise to mentally compare other BigData offerings with BigQuery.

If you decide to try BigQuery or have already tried it out, I’ll love to hear what you think of it. Please leave a comment!

UPDATE (based on Michael Manoochehri’s comment): I didn’t implied that it is prohibitively expensive to upload data to BigQuery. Because I know, it’s NOT! Here is the result that Michael Manoochehri shared: As a test I once ingested about 350 Gb of CSV data (split into 10gb raw files, then I gzipped each one into ~1Gb). I ingested the entire batch using the bq command line tool, and had the entire dataset in BigQuery in just a few hours. I agree that it’s not 100% trivial to move 300 Gb of data from a local cluster into Google’s cloud – but it’s not really that difficult.

[Update: Aug 20th 2012: If you are interested in the Mechanics behind BigQuery – search for “Google Dremel Whitepaper”. it’s an amazing read]