Seven ways Analytics can create value in an Organization:

Standard

The value created by Analytics in an organization can be more than one and it depends on what an organization is trying to do with analytic’s – with that, Here’s the list that I compiled from my readings over the past few weeks:

  1. Increases revenue
  2. Creates strategic advantage
  3. Improved decision making (And increasing the speed of decision-making)
  4. Generates innovative insights
  5. Increases productivity
  6. Increases margins
  7. Reduces operational costs

Back to basics: Why do you need OLAP cubes/ Data-Warehouses for enterprise business reporting systems?

Standard

Once in a while I write about back to basics topics to revisit some of the fundamental technology concepts that I’ve learned over past few years. Today, we’ll revisit why do we use OLAP and Data Warehouses for business reporting systems. Let me share some of the most common reasons and then I’ll point to resources that offer other reasons.

Let’s see some of the most common reason:

#1: Business Reports should not take lot of time to load.

From a Business User Perspective: They don’t want to wait for their report to populate data. Reports should be fast!

business reporting analysis

But if business users have to wait for data to show up on their report because of slow query response, then that would be bad for everyone involved. Business Intelligence solutions cannot permeate in an organization if the reports take a lot of time to load:

Business reporting analysis querying

So from a Technology standpoint: What can you do? And also why did the problem arise in the first place?

Let’s first see why the problem occurred in the first place?

So we have a bunch of database tables. To create a report, we’ll have to summarize (aggregate) values in lots of rows (think millions) and join few tables – turns out that if you query a transactional system (Database / OLTP), then you’ll get a slow response. In some cases, if the data model + data size + queries are not complex, then you could just run a query to create operational business reports and you won’t see any performance issues. But that’s not the case always! So if data model + data size + query requirements for reporting are not simple for OLTP/databases to handle and you see a poor performance – in other words, database/OLTP system takes up a lot of time returning data that the business reports require and the business users would see bad performance. The issue goes beyond complex data model + data size + query. You see, transactional systems may be running other tasks in parallel to returning data to business reports. So there’s the issue of resource contention on the OLTP database.

so that’s no good, right? Not only is the OLTP system bad at running queries needed for business reports but it also does not dedicate it’s resource for us!

So let’s create a copy of databases and have them dedicated to answer questions to business reports. so there’s not an issue of resource contention as we have dedicated resources to handle that. And while, we are at it – why don’t change the data model so that it best suits the queries that are needed for business reporting and analysis?

Well, that’s exactly what OLAP database is. It’s a database that’s created for business reporting and analysis. It’s does some neat things like pre-aggregating some values PLUS the data model in OLAP  is also best suited for reporting purposes. (Read more about Star schemas/ data mart / data warehouse / ETL if you’re curious to learn more).

OK – so that’s one reason: To improve performance! Now let’s see another one.

#2: Creating Business Reports over Transactional systems (OLTP) data is NOT developer-friendly:

Ok, so we already covered in the previous section that creating business reports over OLTP can cause performance issues. But there’s more to it then just performance. You see – the requirements of creating business reports is different then the requirements of transactions systems. So? well, that means that the data model used for OLTP is best suited for transactional systems and it is not an optimal data model for analysis and reporting purpose. for example: creating hierarchies, drill-down reports, year-over-year growth among other things are much more efficiently handles by OLAP systems. But if we were to use OLTP database, then it would take a lot of developer hours to write efficient (and correct!) SQL commands (mostly stored procedures) to get OLTP to give data that the business reports need. Also, some of the common business metrics that are used in reporting can be stored in a cube. so that each time a report get’s created, you can re-use the business metric stored in the OLAP cubes.

OK – so OLAP cube saves time (to create reports).

Not only, OLAP cubes perform better at returning data they also help us speed the process of creating reports.

That’s great! let’s see one more reason:

#3: Ad-hoc reporting over OLTP systems creates confusion!

This reason is more about why we should have a data-mart and data-warehouse.. So why do ad-hoc reporting over OLTP systems creates confusion among business users?

Imagine creating reports over a LIVE system that’s getting updated every seconds. If there are ad-hoc (as-needed basis) reports being created by different users – then everyone would see different results. so it’s important to have a common version for everyone. Also imagine, everyone combining data from different data sources. If they’re doing it differently then they would see different data. And not only that, if they’re creating derived (or calculated) columns and their formula’s are different then they would see different data. you see a common pattern here? There’s no conformity in data & formulas in the reports that gets created. What does it cause? Confusion! So what’s needed is what they call in the data-warehouse “single version of truth”. OLAP cubes (which gets data from data-warehouse) provide that common single data source for everyone and thus the conformity in data is maintained while creating business reports.

Also while we are at this, one more consideration that typically reports require historical data at aggregated level. So we don’t want to store each transaction over the last 10 years in an OLTP database, do we? NO! right? In such cases, the historical data is aggregated based on requirements and stored in datamarts (/data-warehouse) which is later consumed by the OLAP cubes and that way OLTP databases do not have to store lot of historical data.

Ok – that’s one more reason OLTP are bad w/ business reporting and analysis and that’s why we need data-marts (data warehouse) and OLAP Cubes.

That’s about it. for this post. Let me point you to Related Resources:

  1.  Just what are cubes Anyway (MSDN)
  2. Why do I need a Cube
  3. Why use a SSAS cube?

And as always, your feedback is most welcome! if I have missed some point and if you want to highlight it – please leave a comment!

A great example to show power of visualizing data: Anscombe’s Quartet Table

Standard

Let’s look at four datasets which have identical statistical properties:
Here’s the DATA:

ansombe quarter data visualizeHere’s their statistical properties:

Property Value
Mean of x in each case 9 (exact)
Variance of x in each case 11 (exact)
Mean of y in each case 7.50 (to 2 decimal places)
Variance of y in each case 4.122 or 4.127 (to 3 decimal places)
Correlation between x and y in each case 0.816 (to 3 decimal places)
Linear regression line in each case y = 3.00 + 0.500x

They look identical – don’t they? BUT let’s visualize the data:

Anscombe quarter data visualizationOnly visualizing data made it possible for us to understand and appreciate the “difference” between data-sets. Looking at just statistical properties made them appear “similar” – moral of the story: Visualize data! Graph data along with investigating statistical properties.

Source: Anscombe’s quartet

PowerPivot DAX: Moving Average

Standard

A quick blog post about sharing PowerPivot DAX formula to calculate moving average. Please use this as building block for your scenario:

Here’s the Data:

DATE AMOUNT
6/1/2013 50
6/2/2013 40
6/3/2013 30
6/4/2013 20
6/5/2013 10
6/6/2013 10
6/7/2013 20
6/8/2013 30
6/9/2013 40
6/10/2013 50

(usually, the date would in a date table. For the purpose of sharing building block of this formula, I kept it this way to keep it simple)

Formula (Calculated Measure):

[code language=”text”]
Moving Average Last 3 days:=CALCULATE(AVERAGE(TableName[Amt]),DATESINPERIOD(TableName[date],PREVIOUSDAY(TableName[date]),-3,day))
[/code]

Results via a Pivot Table: powerpivot dax moving average

Note that the formula is a building block and you’ll have to make changes as per your requirement and data model. Feel free to leave a comment if you need some assistance from my side. And also consider using the PowerPivot forum to reach out to community: MSDN Forum – PowerPivot for Excel

Exploring, filtering and shaping web-based public data using Data Explorer Excel add-in:

Standard

Data Explorer let’s you “Explore” (search) for web-based public data. This is a great way to combine data that you may have in your data-sources with public data sources for data analysis purposes. Sometimes your data might not tell you the reason behind the observed trends, so when that happens – you can try to see if a public data-set might give you the much-needed context. Let me give you an Example before we start hands-on w/ data explorer so that you have better understanding of importance of public datasets. Here’s a sample that I found here. So, Here’s a demo:

An auto company is seeing sales trends of Hybrid cars and SUV’s from the sales data-sources. But what is the reason behind that? company data does not show that. Someone hypothesizes that it might be because of gas prices. So they test out the hypothesis by combining gas prices information available via public data. And turns out gas prices might be the driving force of sales trends! SEE:

if the gas prices increase, then the sale of SUV go down and the sale of Hybrids go up:

data analysis combine data with public datasets

You know that public data can be helpful! So how can you search for public data-sets? Well, You can manually search online, ask someone, browse through public data repositories like azure data market (and other data markets), there’s also a public data search engine! OR you can directly search for them via Data Explorer.

Here are the steps:

1) Excel 2010/2013 > Data Explorer Tab > Online Search > type “Tallest Buildings”

excel public data search data explorer2) I selected one of the data-sets that said “Tallest completed building…. ”

excel data from internet

3) Now let’s do some filtering and shaping. Here are the requirements:

– Hide columns: Image, notes & key

– clean columns that has heights data

– Show only city name in location

OK, let’s get to this one by one!

4) Hiding Columns:

Click on Filter & Shape button from the Query Settings:

excel data shaping cleaning

Select Image Column > Right Click > Hide:

excel hide remove columns

Repeat the steps for notes & key column.

Click on DONE

5) clean column that has heights data.

Click on Filter & Shape to open the query editor

A) let’s rename it. Select column > Right Click > Rename to Height > press ENTER

B) let’s remove the values in brackets. Select Column > right click > split column > By delimiter > At each occurrence of the delimiter > Custom and enter “(” > OK

excel split a columnThis should transform the data like this:

excel data explorer split a column

Hide height.2 and rename the height.1 to height

Click on DONE

6) Let’s just have city names in the location column

click on Filter & shape to load query editor:

A) select location > right click > split column > by delimiter > Custom – Enter: ° in the text box like this:

an excel split by delimiter dataclick on OK

Hide Location.2, Location.3, Location.4 & Location.5

Select Location.1 > Right Click > Split Column > by Number of characters > Number of characters: 2 > Once, as far right as possible > OK

cleaning data in excel shaping filtering

Hide Location.1.2 and rename Location.1.1 to Location

One last thing! making sure that the data type of height is numbers.

Select height > change type > number

Also,

Select floors > change type > number

click on DONE. Here’s our filtered and shaped data!

filter data excel shape clean

7) LET”S VISUALIZE IT!

For the purpose of visualization I copied first 20 rows to a separate excel sheet and created a chart:

z excel data visualization

That’s about it for this post. Here are some related Posts on Data Explorer:
Unpivoting data using the data explorer preview for Excel 2010/2013
Merging/Joining datasets in Excel using Data Explorer add-in
Remove Duplicates in Excel Tables using Data Explorer Add-in
Web Scraping Tables using Excel add-in Data Explorer preview:

Your comments are very welcome!

Web Scraping Tables using Excel add-in Data Explorer preview:

Standard

In this blog post, we’ll see how you can do some web scraping of HTML data tables that you see on the inter-webs!

Before we begin, If you haven’t downloaded and installed the data explorer add-in for Excel 2010 & 2013, you can find Information about it here: http://office.microsoft.com/en-us/excel/download-data-explorer-for-excel-FX104018616.aspx 

First, let’s try doing copy-pasting of tables found on websites into excel without data explorer add-in.

So I found some very interesting tables here: http://powerpivot-info.com/post/16-powerpivot-dax-function-list-with-samples

And here’s my copy-pasting efforts:

excel copy pasting html tablesNow, that requires formatting! I don’t want to do that especially if I am doing that for few more tables – let’s see an elegant way of going about web scraping tables using Data Explorer add-in:

Step 1:

Keep the URL handy.

Now, Open Excel 2010/2013 > switch to Data Explorer tab > click on From Web

Step 2:

Paste the URL that has the tables you need:

excel web scraping html data tables

Step 3:

The dialog box would list all the tables from that HTML page and so you’ll need to select the table that you want.

(optional) if your tables have headers as first rows. Make sure to mark them as headers: Right Click a Column > use First Row as headers

excel data explorer query editorStep 4:

Click DONE and your excel sheet will populate itself w/ the data from the table.

excel data copied from website data explorerThat’s about it for the steps!

Notes:

1) Data Explorer add-in will let you “explore” external open datasets that’s out there on the internet.

2) Please make sure that you’re not violating any copyrights before you go about web scraping and sharing your work.

And here are some related Posts on Data Explorer:
Unpivoting data using the data explorer preview for Excel 2010/2013
Merging/Joining datasets in Excel using Data Explorer add-in
Remove Duplicates in Excel Tables using Data Explorer Add-in

That’s about it for this post, your comments are very welcome!

Getting Started: Implementing Dynamic Security with row filters in Tabular Models

Standard

In this blog post. I’ll help you get started w/ implementing dynamic security with row filters in Tabular Models.

Scenario:

We’ve users that connect to a Tabular Model via Excel for Data Analysis purposes. One of the analysis that they do is Countries VS. Total Margin:

tabular models countries total margin profit

What we want to do is restrict someone from Europe to see data only about France, Germany and United Kingdom

Solution:

1) Open Tabular Model in SSDT (SQL Server Data Tools)

2) Toolbar > Model > Roles

tabular models BISM roles

3)  Role Manager > NEW > change Name to Europe and Permissions to Read

4) Under the Row Filters, for the Geography Table, enter the following code:

=[Country Region Name]=”France” ||  [Country Region Name]=”Germany” || [Country Region Name]=”United Kingdom”

How to edit code for your scenario? change the [country region name] to your column(s) and also change the values

role tabular dax filter ssdt code

5) Click OK

6) Now let’s test it!

7) Toolbar > Model > Analyze in Excel

8) Select the role Europe

dynamic row filter in tabular models9) Click ok.

10) From Pivot Table, Select Margin & Countries:

DAX tabular models dynamic row filters based on location

11) As you can see, since the role Europe was selected for testing purpose in step 8 –  ONLY France, Germany and UK data is shown in our test! This means that our row filters are working perfectly!

I hope this tutorial helps you get started on implementing dynamic security in Tabular models.

Resource:

WhitePaper: Securing the Tabular BI Semantic Model