Notes from Atlanta SQL BI meeting: “Bus Matrix”

Standard

Bus matrix is a very important (if not the most important) topic for a Data Warehouse Developer/Architect. And it was great to re-visit and grow knowledge around this topic at Atlanta SQL BI meeting presented by Bill Anton (b|t).

Here are some of my notes:

  • Dimensional Modeling is more important than ever specially w/ the rise in self-service BI. it’s important to lay out data in an intuitive way to business users.
  • Dimensional Modeling is great! But at the same time it’s important to communicate this to business users. The solution doesn’t involve teaching ER diagrams to users, simplified designs, offloading mapping data to business processes work to business users. So, what’s the solution? That’s where Bus Matrix comes into picture!
  • Bus Matrix is a 2 dimensional intersection of Facts & Dimensions.
  • Bus matrix can not only help in communication w/ Business users, it can also be useful for IT in project planning and documentation purposes.
  • In the requirement gathering phase, a Bus Matrix could be a 2 dimensional intersection of Subject Areas (Sales, Inventory, etc) by Source Systems w/ priorities shown at the intersection. It’s great for road-mapping a BI/DW project.
  • Business Matrix is created during requirements gathering stage (subject area vs source systems), before development (Facts vs Dimensions) and needs to be kept updated over time. After the solution is live, it could be one of the important end-user documentation.

Also, Here’s a great resource page posted by the Presenter (includes his slides): http://byobi.com/blog/bus-matrix/ – check it out! it has some great resources in there.

Thank you Bill & Meeting organizers!

PASS Business Analytics VC’s Online Event: “Power BI Info Management and Data Stewardship”

Standard

Power BI is an exciting new technology in the business analytics space from Microsoft. I’ve played with its current preview version & attended couple of sessions on Power BI at PASS Summit 2013. Based on my first impression, I noted down Problems that Power BI solves. Note that as of today, it’s in preview & so information around cost is not availale yet but I try to learn and understand as much as I can Today about how Power BI is going to help business users & power users in the future. As a part of that, I’m attending Business Analytics VC’s session on “Power BI Info Management and Data Stewardship” by Matthew Roche & Ofer Ashkenazi on Nov 7th 12 PM EST.

Topic: Power BI Info Management and Data Stewardship

Date & Time: Nov 7th 12 PM EST

Here’s the Link to register: http://bit.ly/PASSBAVC

Topic Abstract:

“Business intelligence tools continue to improve, letting users shorten their time to insight and take that insight to more devices in more places. But this evolution of BI doesn’t change one fundamental fact of information management: You can’t gain insight from data you can’t access.

In this session, Matthew Roche and Ofer Ashkenazi will introduce the role of the data steward and the self-service information management capabilities included in Power Query and Power BI for Office 365, focusing on how Power BI empowers business users to add value to the organization.”

Closing note:

I recently volunteered at Business Analytics VC as VP of Marketing, so it’s in my interest to spread word about the event but I would not spread word about something unless it gets me personally excited about it! 🙂

I hope to see you at the session and for some reason if you can not make it, we usually record sessions & so you can check out the meeting archives section of the PASS BA VC site after the event.

Exploring, filtering and shaping web-based public data using Data Explorer Excel add-in:

Standard

Data Explorer let’s you “Explore” (search) for web-based public data. This is a great way to combine data that you may have in your data-sources with public data sources for data analysis purposes. Sometimes your data might not tell you the reason behind the observed trends, so when that happens – you can try to see if a public data-set might give you the much-needed context. Let me give you an Example before we start hands-on w/ data explorer so that you have better understanding of importance of public datasets. Here’s a sample that I found here. So, Here’s a demo:

An auto company is seeing sales trends of Hybrid cars and SUV’s from the sales data-sources. But what is the reason behind that? company data does not show that. Someone hypothesizes that it might be because of gas prices. So they test out the hypothesis by combining gas prices information available via public data. And turns out gas prices might be the driving force of sales trends! SEE:

if the gas prices increase, then the sale of SUV go down and the sale of Hybrids go up:

data analysis combine data with public datasets

You know that public data can be helpful! So how can you search for public data-sets? Well, You can manually search online, ask someone, browse through public data repositories like azure data market (and other data markets), there’s also a public data search engine! OR you can directly search for them via Data Explorer.

Here are the steps:

1) Excel 2010/2013 > Data Explorer Tab > Online Search > type “Tallest Buildings”

excel public data search data explorer2) I selected one of the data-sets that said “Tallest completed building…. ”

excel data from internet

3) Now let’s do some filtering and shaping. Here are the requirements:

– Hide columns: Image, notes & key

– clean columns that has heights data

– Show only city name in location

OK, let’s get to this one by one!

4) Hiding Columns:

Click on Filter & Shape button from the Query Settings:

excel data shaping cleaning

Select Image Column > Right Click > Hide:

excel hide remove columns

Repeat the steps for notes & key column.

Click on DONE

5) clean column that has heights data.

Click on Filter & Shape to open the query editor

A) let’s rename it. Select column > Right Click > Rename to Height > press ENTER

B) let’s remove the values in brackets. Select Column > right click > split column > By delimiter > At each occurrence of the delimiter > Custom and enter “(” > OK

excel split a columnThis should transform the data like this:

excel data explorer split a column

Hide height.2 and rename the height.1 to height

Click on DONE

6) Let’s just have city names in the location column

click on Filter & shape to load query editor:

A) select location > right click > split column > by delimiter > Custom – Enter: ° in the text box like this:

an excel split by delimiter dataclick on OK

Hide Location.2, Location.3, Location.4 & Location.5

Select Location.1 > Right Click > Split Column > by Number of characters > Number of characters: 2 > Once, as far right as possible > OK

cleaning data in excel shaping filtering

Hide Location.1.2 and rename Location.1.1 to Location

One last thing! making sure that the data type of height is numbers.

Select height > change type > number

Also,

Select floors > change type > number

click on DONE. Here’s our filtered and shaped data!

filter data excel shape clean

7) LET”S VISUALIZE IT!

For the purpose of visualization I copied first 20 rows to a separate excel sheet and created a chart:

z excel data visualization

That’s about it for this post. Here are some related Posts on Data Explorer:
Unpivoting data using the data explorer preview for Excel 2010/2013
Merging/Joining datasets in Excel using Data Explorer add-in
Remove Duplicates in Excel Tables using Data Explorer Add-in
Web Scraping Tables using Excel add-in Data Explorer preview:

Your comments are very welcome!

How conditionally formatting your data in Excel can help you save time in answering business questions?

Standard

Visual analytics is amazing – it helps “data enthusiasts” save time in answering questions using Data. Let’s see one such example. For the purpose of the blog post, I am going to show how to do it in Excel 2010:

Problem:

Here’s the Business Question: What was sales of Tea in North Region in 2012 Q1

Here’s the data:

SALES DATA(2012 Q1)  East West Central North South
Coffee  $  7,348.00  $  7,238.00  $  1,543.00  $  9,837.00  $    1,823.00
Tea  $  9,572.00  $  8,235.00  $  3,057.00  $  8,934.00  $  13,814.00
Herbal Tea  $  5,782.00  $  8,941.00  $  9,235.00  $     392.00  $    1,268.00
Espresso  $  9,012.00  $  2,590.00  $  4,289.00  $  7,848.00  $       340.00

So it’s easy to give out answer using the data: $8934

But let me CHANGE the business question:

WHICH Products in WHAT regions are doing the best?

Now this questions is not as easy as the previous one? WHY? because you’ll have to manually go through each number in a linear fashion to answer the question. Now imagine a bigger data-set. It’ll take even more time.

Solution

What can Excel Power users and Data Enthusiasts do to answer the new business question in an efficient way? Well, let’s see what conditional formatting can do it:

Excel Visual Analytics Conditional formatting

Now with the Data Bars, it’s easier to just glance at the report and see best performing products and regions. For instance, it’s very easy to spot that Tea is performing best in South among all products and region.

So how do you create data bars?

1. Select the data

2. Home > Conditional Formatting > Data Bars

Excel Visual Analytics Conditional formatting 2

3.Done! you’ll see this:

Excel Visual Analytics Conditional formatting

4. You can play with other options here to see what suits the best for your needs. But I just wanted to point out that there is a way for you to highlight the data in a way that helps you save time in answering business questions using data

Conclusion:

Visual analytics is a great way to quickly analyze data. In most cases, Human brain is much faster at interpreting the visual results as oppose to text/numbers – so why not use it to your advantage. And tools like Excel have inbuilt functionality to help you do that!

Business Analytics project is like an iceberg:

Standard

Business persons may not realize that Business Analytics project may involve significant efforts for the under the hood technical tasks like Data Cleaning, Data Integration, Building-a-data-warehouse, creating ETL processes, gathering business requirements among other tasks. And that explains the title of this blog: Business Analytics project is like an iceberg. It’s because, a business person may just see the tool used to visualize data but may not realize the work that went into making it “analytics-ready”. From a project management standpoint – before a project is initiated, the discussion about this different aspects of the project need to communicated to the business stakeholders so that they are in the know of efforts involved in building an analytics solution. And with that, Here’s the summary of this discussion in form of an Image:

iceberg business analytics data cleaning business intelligence data warehouse

Related articles

Data Reporting ≠ Data Analysis

Standard

One of the key thing I’ve learned is importance of differentiating the concepts of “Data Reporting” and “Data Analysis”. So, let’s first see them visually:

data analysis and data reporting

Here’s the logic for putting Data Reporting INSIDE Data Analysis: if you need to do “analysis” then you need reports. But you do not have to necessarily do data analysis if you want to do data reporting.

From a process standpoint, Here’s how you can visualize Data Reporting and Data Analysis:

data analysis and data reporting process

Let’s thing about this for a moment: Why do we need “analysis”?

We need it because TOOLS are really great at generating data reports. But it requires a HUMAN BRAIN to translate those “data points/reports” into “business insights”. This process of seeing the data points and translating them into business insights is core of what is Data Analysis. Here’s how it looks visually:

Data analysis Data Reporting

Note after performing data analysis, we have information like Trends and Insights, Action items or Recommendations, Estimated impact on business that creates business value.

Conclusion:

Data Reporting ≠ Data Analysis

Examples of Machine Generated Data from “Big Data” perspective:

Standard

I just researched about Machine Generated Data from the context of “Big data”, Here’s the list I compiled:

– Data sent from Satellites

– Temperature sensing devices

– Flood Detection/Sensing devices

– web logs

– location data

– Data collected by Toll sensors (context: Road Toll)

– Phone call records

– Financial

And a Futuristic one:

Imagine sensors on human bodies that continuously “monitor” health. How about if we use them to detect diabetes/cancer/other-diseases in their early phases. Possible? May be!

Interesting Fact:

Machine can generate data “faster” than humans. This characteristics makes it interesting to think about to analyze machine generate data and in some cases, how to analyze them in real-time or near real-time

Ending Note:

Search for Machine Generated Data, you’ll be able to find much more, it’s worth reading about from the context of Big Data.

Thanks:

http://www.dbms2.com/2010/04/08/machine-generated-data-example/

http://en.wikipedia.org/wiki/Machine-generated_data

http://tdwi.org/articles/2012/10/23/machine-generated-big-data.aspx