Presented at #sqlpass summit 2015.
[VIDEO] Microsoft’s vision for “Advanced analytics” (presented at #sqlpass summit 2015)Standard
Presented at #sqlpass summit 2015.
Presented at #sqlpass summit 2015.
Microsoft announced a cloud based business intelligence platform called Power BI – as a part of that, the project (in public preview) that was previously called “Data Explorer” will be released as “Power Query”. It’s a great tool that have used to find, clean and shape data in Excel 2010, very useful! So one of the first things I checked was whether Excel 2010 can run Power Query or not. Turns out, it does! It works with Excel 2010 professional plus (Please read the system requirements on the official download page for details)
And of course, I downloaded and installed it on my Excel 2010 professional plus.If you’ve not installed Office 2010 SP1 or higher, do that too.
Please note that this change affects some of the blog posts that I’ve published on this blog, Here’s the list:
1) Exploring, filtering and shaping web-based public data using Data Explorer Excel add-in
2) Web Scraping Tables using Excel add-in Data Explorer preview
3) Unpivoting data using the data explorer preview for Excel 2010/2013
4) Merging/Joining datasets in Excel using Data Explorer add-in
5) Remove Duplicates in Excel Tables using Data Explorer Add-in
That’s about it for this post. Update your “Data Explorer” tab to “Power Query” if you haven’t already! It’s a handy tool and I am glad to see that
Data Explorer Power Query runs on Excel 2010 Pro Plus!
What is Neologism?
Neologism means The coining or use of new words – And I believe it’s one of the challenge faced by IT professionals. Nowadays, we put our time & energy trying to get head around “new terms/words/trends”.
Let’s take couple of example(s):
Sometime back, we had cloud computing. Nowadays, its Big Data; In my mind – Big Data has been coined to mean following technologies/techniques under different contexts:
Note: The above image is just for illustration purpose. It does not comprehensively cover every technology that is now called “Big Data”. Feel free to point it out if you think I missed something important.
And Neologism is challenge because:
1) Generally, it’s a new trend and there is little to no consensus on what does it “Exactly” mean
2) It means different things in different context
3) Every person can have their own “interpretation” and no one is wrong.
4) It’s a moving ball. The definition used today will change in future. So we always need a “working” definition for these terms.
Now, Don’t get me wrong, It’s fun trying to figure out what does it all mean and trying to gauge whether it matters to me and my organization or not! What do you think – as a Person in Information Technology, do you think that Neologism is one of the challenges faced by us? consider leaving a reply in the comment section!
Want to learn about BigData? read Oreilly’s Book “Planning for BigData”
Quote for Big-Data / Data-Science/ Data-Analysis enthusiasts:
Who on earth is creating “Big data”?
Examples to help clarify what’s unstructured data and what’s structured?
I normally Blog about the answers that I give out on MSDN forums. The answer on MSDN forum is generally brief and to the point and in the blog post – I expand it to cover related areas. Here are the questions for which I didn’t choose to write a blog. So I am just going to archive them for now:
Join the Azure PASS VC’s session on “Getting Started with Windows Azure” on:
Date: 24th Sep (Monday)
Time: 11 AM Eastern Time; 8 AM Pacific; 8:30 PM India Time; You can download the event calendar from here
Speaker: Brian Prince, Principal Cloud Evangelist Microsoft
Session Abstract: Windows Azure is Microsoft’s cloud platform for quickly building and running scalable applications. We will cover just what the cloud is, as an industry, and what Microsoft is offering. We will see into the data-centers, how they work, and the a high level view of all the components of the platform.
More Details: http://azure.sqlpass.org/
Recently I completed a cloud computing course taught at University of Washington and so now I am a certified Cloud guy but more importantly It was great learning experience!
The course had three courses which covered following topics:
Thanks University of Washington and Instructors for a great learning experience!
After reading First Impression: Google’s BigData offering called BigQuery , a reader (Shadab Shah) had few questions about it and in this blog-post, I am going to address those questions:
Q1. Any browser based Tool’s to Query data in BigQuery?
A1: They have a Browser Based Tool which they call “BigQuery Browser Tool” using which you can Query Data.
Apart from browser based, there are other tools too:
1) a command line tool called “BQ command-line tool. You can find more information here: https://developers.google.com/bigquery/
2) API. one can “include” big data analytic capabilities into a web app via RESTFul API. (Point #2 content credit: Michael Manoochehri’s comment)
Q2) Where is the Data Stored?
If i just say “Google Cloud” that would not be a complete answer. There’s a complementary service called “Google CLOUD SQL” and so I do not want you to confuse data stored for BigQuery with “Google cloud SQL”.Theres’ a difference between BigQuery and Google cloud SQL, you can read that here: https://developers.google.com/bigquery/docs/overview
Having said that, it’s stored on Google’s cloud and if you wish to use BigQuery – you’ll have to upload your data-set in a CSV format and if you do so, it’s stored in Google cloud and is ready to be analyzed via BigQuery.
Q3) Where do I find lots of data to play with BigQuery?
Google has few sample data-sets that you can play with:
That’s about it for this post. Thanks Shadab Shah for the questions, I hope this post is useful.
As a part of University of Washington’s (UW) cloud class’s assignment, I played with Google’s BigData offering BigQuery and I am writing this blog post to share what I think about it. please note that the views are my own and do not represent those of the instructor’s and fellow students at UW. And also I am not a BigData “Expert”, Think of me as a student trying to get my head around various offerings out there – So if you feel otherwise about what I have written, Just let me know in the comments section. Any-who read along to know what I think of BigQuery:
First up what is BigQuery?
It’s a platform to analyze your data (lot’s of it) by running SQL-Like Queries. And it’s really SQL-Like, and so if you are from SQL world like me – you would not face any issues in getting up and running in seconds by referring to the nicely written documentation.
And other point to consider here is that even though it’s SQL-Like, you’ll be able to analyze considerable number of rows in few seconds. Let me give you an example: I played with a sample (called gsod) which had 115M rows and as per my experiments, I was able to get answers to simple computations like max, mean, avg, etc in less than couple of seconds. And little complex queries having where, joins and group by in around 5-6 seconds. Your results may vary depending on the type of query you run but the BOTTOMLINE is that it is FAST. that’s a good news!
BigQuery is Fast!
But what bothers me is that How am I suppose to “UPLOAD” lots of data on the Google CLOUD. It takes time, right? But I guess that’s an issue with every cloud based BigData offering. But here’s what I am thinking – If your data is already on the cloud. for e.g. Amazon’s or Microsoft’s – Does it not make sense to run analytic’s on Amazon’s and Microsoft’s cloud instead of porting your data to Google’s?
[Sidenote: I like it that Hadoop on Azure allows Amazon S3 data source. Nice move!]
My concern: Time spent in uploading truckload of data to Google’s cloud just so that we can use it for BigQuery
And even if you have your data on GAE data-store, you’ll have to uplaod your data to BigQuery separately. Source
Zooming out for a moment, I feel the Goal of BigQuery was to offer an easy to use BigData platform, And I feel that’s what they have delivered:
An easy-to-use + easy-to-setup “Hadoop+Hive” Like Offering.
[Update: Aug 20th 2012: I have been thinking about it more and I realized that BigQuery is more about satisfying real-time Big Data Scenario’s. And Hadoop/Hive/MapReduce is more about Batch Oriented analysis and it’s great if you need to pre-process tons and tons of data]
But this “easiness” means that It is NOT as advanced as a Hadoop Installation (or Hadoop-on-Azure or Amazon’s elastic-map-reduce). But again, it’s easier and faster to get started with BigQuery. I guess, it just depends on what you are trying to achieve and based on that you’ll have to figure which is right tool for your scenario. No generic answer here, Sorry!
And BTW BigQuery supports only CSV – Talk about Variability (One of the V’s of BigData!). Let’s not get into that. I just wanted to Point that out because if you’re looking to analyze data-sets that cannot be converted to CSV for running SQL-Like Queries on top of them then BigQuery is not for you.
Try out BigQuery. It’s easy to get started. It’s powerful if SQL-Like queries are all what you’ll need to analyze your data. If you are BigData enthusiast/expert/student – It’ll be a nice exercise to mentally compare other BigData offerings with BigQuery.
If you decide to try BigQuery or have already tried it out, I’ll love to hear what you think of it. Please leave a comment!
UPDATE (based on Michael Manoochehri’s comment): I didn’t implied that it is prohibitively expensive to upload data to BigQuery. Because I know, it’s NOT! Here is the result that Michael Manoochehri shared: As a test I once ingested about 350 Gb of CSV data (split into 10gb raw files, then I gzipped each one into ~1Gb). I ingested the entire batch using the bq command line tool, and had the entire dataset in BigQuery in just a few hours. I agree that it’s not 100% trivial to move 300 Gb of data from a local cluster into Google’s cloud – but it’s not really that difficult.
[Update: Aug 20th 2012: If you are interested in the Mechanics behind BigQuery – search for “Google Dremel Whitepaper”. it’s an amazing read]
Following the announcements at “Meet Windows Azure” event, we now have three options to run SQL Server on CLOUD; They are:
1. SQL Azure which is now called Windows Azure SQL Database
2. SQL Server on Windows Azure VM Roles (Nice addition, in my opinion!)
3. SQL Server on Amazon Web Services RDS
And apart from these options,
if you can fire up a VM on cloud and decide to run SQL Server on it – that’s also SQL Server on CLOUD.
Update 22 June 2012:
Naveen commented about running SQL Server on Amazon EC2.
1. SQL Azure reporting is generally available and backed by SLA
2. You can now run SQL Server on VM roles
3. Azure was rebranded a while back but quick reminder: SQL Azure was renamed to Windows Azure SQL Database and so in the “new” portal – you’ll see “SQL database” instead of SQL Azure.
I’ll blog about these features as and when I get a chance to play with it.
Read all updates here: Now Available: New Services and Enhancements to Windows Azure
And I updated http://parasdoshi.com/whats-new-in-sql-azure/