In this blog post, we’ll see steps to Parametrize a SQL Server Reporting Services report where data source is Analysis services (SSAS) cube:
Step 1: Let’s say you have reached to a point with your SSRS report where you’ve configured your data source, data sets and data fields that you want to use for your report. For the purpose of this blog post, I’ll be starting with a SSRS report that shows Sales VS country names:
Step 2: Now, let’s say the requirement are such that you want to parametrize the report by a data field in the Analysis services cube: continents
Step 3: Switch to Design View. Now navigate to query designer: Select your Data-set > Right Click > DataSet Properties > Query > Query Designer
Step 4: Drag the field to the filter area. For the purpose of this blog post, I am going to select Continent Name and add it to Filter area.
To add a field to filter area, there are two options:
#1: Select the field > Right click > Add to Filter
#2: Select the field > use your mouse to drag it to filter area
Step 5: Once you’ve added your desired field to the filter area, we’ll have to add it as parameter.
Now chances are that you are not seeing the parameter check box for this field because the dialog box is minimized. You can either maximize the dialog box or scroll to the right side of the filter area.
Once you see it, check it > click ok
Step 6: Once you’re back on Design View. Try “preview” report. you should be able to see the option to select parameter value before the report gets populated with data:
I selected Europe and then clicked on view report:
Step 7: One last thing, Let me also point out how you can change the properties of the parameters.
Go Back to design view > from the report data pane > Expand parameters folders > select the parameter > Parameter Properties
I’ll leave you with exploring what you can do with parameter properties! And with that I conclude this blog post, Your comments are very welcome!
Let’s look at four datasets which have identical statistical properties: Here’s the DATA:
Here’s their statistical properties:
Mean of x in each case
Variance of x in each case
Mean of y in each case
7.50 (to 2 decimal places)
Variance of y in each case
4.122 or 4.127 (to 3 decimal places)
Correlation between x and y in each case
0.816 (to 3 decimal places)
Linear regression line in each case
y = 3.00 + 0.500x
They look identical – don’t they? BUT let’s visualize the data:
Only visualizing data made it possible for us to understand and appreciate the “difference” between data-sets. Looking at just statistical properties made them appear “similar” – moral of the story: Visualize data! Graph data along with investigating statistical properties.
Data Explorer let’s you “Explore” (search) for web-based public data. This is a great way to combine data that you may have in your data-sources with public data sources for data analysis purposes. Sometimes your data might not tell you the reason behind the observed trends, so when that happens – you can try to see if a public data-set might give you the much-needed context. Let me give you an Example before we start hands-on w/ data explorer so that you have better understanding of importance of public datasets. Here’s a sample that I found here. So, Here’s a demo:
An auto company is seeing sales trends of Hybrid cars and SUV’s from the sales data-sources. But what is the reason behind that? company data does not show that. Someone hypothesizes that it might be because of gas prices. So they test out the hypothesis by combining gas prices information available via public data. And turns out gas prices might be the driving force of sales trends! SEE:
if the gas prices increase, then the sale of SUV go down and the sale of Hybrids go up:
You know that public data can be helpful! So how can you search for public data-sets? Well, You can manually search online, ask someone, browse through public data repositories like azure data market (and other data markets), there’s also a public data search engine! OR you can directly search for them via Data Explorer.
Here are the steps:
1) Excel 2010/2013 > Data Explorer Tab > Online Search > type “Tallest Buildings”
2) I selected one of the data-sets that said “Tallest completed building…. ”
3) Now let’s do some filtering and shaping. Here are the requirements:
– Hide columns: Image, notes & key
– clean columns that has heights data
– Show only city name in location
OK, let’s get to this one by one!
4) Hiding Columns:
Click on Filter & Shape button from the Query Settings:
Select Image Column > Right Click > Hide:
Repeat the steps for notes & key column.
Click on DONE
5) clean column that has heights data.
Click on Filter & Shape to open the query editor
A) let’s rename it. Select column > Right Click > Rename to Height > press ENTER
B) let’s remove the values in brackets. Select Column > right click > split column > By delimiter > At each occurrence of the delimiter > Custom and enter “(” > OK
This should transform the data like this:
Hide height.2 and rename the height.1 to height
Click on DONE
6) Let’s just have city names in the location column
click on Filter & shape to load query editor:
A) select location > right click > split column > by delimiter > Custom – Enter: ° in the text box like this:
In this blog post, we’ll see how you can remove duplicated and clean data in excel tables using Data Explorer Add-in.
Our Excel Table has following Data:
And we want to remove duplicates to make the data-set look like this:
In real world data-sets, we wouldn’t have few rows but lot’s of rows and doing it manually wouldn’t be the wisest option. With that in mind, let’s look for a few-clicks solution that can help us remove duplicates.
If you haven’t already, download the Data Explorer add-in preview available for Excel 2010 & 2013. It can do a lot more than removing duplicates – it’s a great add-in and it’ll save you lots of time especially if your job involves discovering, cleaning and combining data for analysis purposes. After you’re done installing the add-in, use the steps below to remove duplicates in an excel column:
1. Open Data in Excel. Switch to Data Explorer Tab
2. For the purpose of the demo, I am assuming that you already have the data in excel file. If not, you can connect to other sources via the add-in.
3. Data Explorer add-in > Excel Data> From Table
4. After you’ve clicked on the From Table, a query editor will pop up:
5. Select both columns
(you can select both columns by: select first column > hold down the ctrl key and then click on second column)
6. Right click > Remove Duplicates
7. click on done if you see that the duplicates have been removed correctly
In this blog post, we saw how to remove duplicates and clean data in Excel using the Data Explorer Preview add-in.
Few things about the data & visualization that I am going to share
I downloaded the national level data But there’s also state level data available if you’re interested to drill down.
The reports that you see where created after I got a chance to “clean” the data-set a bit and created a data model that suited basic reporting on top of it.
For this blog post, I am going to play w/ May 2010 & 2011 data
With the help of original data-set, you can drill down to get statistics about a particular Job Category if you want. For this blog-post, I am going to share visualizations that correspond to Job categories.
click on images to see the higher resolution image.
With that, Here are some visualizations:
1) Job Category VS mean hourly salary:
2) Job Category VS number of employees:
3) Scatter Plot:
X Axis: Number of employees
Y – Axis: Wage (Mean Hourly Salary May 2011)
Size of Bubble: Wage (Mean Hourly Salary May 2011)
*Note: This may not be the best approach to create the Scatter Plot as I have used the same value (Mean Hourly Salary May 2011) twice – But since I was just playing w/ it, I went with what I had in the model.
Here’s the visualization:
Some of the things I observed:
1) I belong to an Industry (Computer and Mathematical occupations) which has relatively higher mean hourly wage.
2) There are few people working in “farming, fishing & forestry occupations” that do not get paid much.
3) There are lots of people working in “office administrative support occupations” that do not get paid much.
4) Management Occupations, Legal Occupations and computer & mathematical occupations have relatively higher mean hourly wages.
In this post, I played w/ Occupational Employment statistics data-sets and shared some visualizations.
In this blog – post, I’ll share a quick demo of how you can use Regular Expressions in Data Quality services to clean records. For the purpose of the demo, I’ll show how to clean perform a preliminary cleaning procedure on email records.
Before we begin, just a primer on Regular Expressions:
“a regular expression provides a concise and flexible means to match (specify and recognize) strings of text, such as particular characters, words, or patterns of characters. Common abbreviations for “regular expression” include regex and regexp.” – source: WikiPedia Regular Expressions
Let’s take an example:
using Regular Expressions you can:
check if a text starts with A or a
check if a text contains the word “data”
check if a text follows the following pattern: email@example.com.
among other things..
Now, Note the power of Regular Expressions to check for a pattern like firstname.lastname@example.org – now can we not use it to perform a basic cleaning on email records? I think, we can! and so, let’s try that:
for the purpose of the blog post, let me quickly show how you can create a Domain in a Knowledge Base that uses Regular Expressions to clean email records:
1) Open Data Quality Client > New Knowledge Base > Domain Management > create a domain
2) Switch to Domain Rules Tab
3) create a new Domain domain rule which uses the Regular Expression: [A-Za-z0-9._-]+@[A-Za-z0-9.-]+.[A-Za-z] to perform a basic cleaning on email records:
4) You can use this Knowledge base to clean email records in Data Quality Records But for now, let’s test our Regular Expression > click Test Domain Rule > Enter few records > Test them!
Note: This is just a basic demo to show Regular Expressions in action. For cleaning Email records, there are other options like using third-party data sets or writing an advanced regular expression. The RegEx I shared was just meant for quick demo but if you want to read more about Regular Expressions used to clean emails then I will strongly recommend you to read email regular expressions that I got to know from Erik Eckhardt via the discussion on a Beyond Relational Tip that I had posted.
In this blog post, we saw how to do basic cleaning on email records using regular expressions in Data Quality Services
4) Now via DQS client, let’s create a Knowledge Base!
5) I’ve created following domains:
And a composite domain: Full Address which contains domains: Address line, city, state and zip in the composite domains.
6) And for the composite domain Full Address – this is how you configure reference data-sets:
7) After creating the Knowledge Base, start a new DQS project. Here the cleaning happens at the composite domain level and this is a demo of how DQS uses third party reference data-set to classify records as suggested, new, invalid, corrected or correct:
You can see that DQS corrected few records and mentions that the reason in this case was Melissa data’s address verify reference data-set:
When you see a price of an item as -10 (negative 10) – you know it’s not right! It’s BAD data!
When you see some person’s height as 120 inches (10ft!) – you know that’s just not true! Again, BAD data!
It’s not uncommon to see such values in REAL data sets. In this post, we’ll see how SQL Server 2012’s Data Quality Services would help us clean unrealistic a.k.a invalid values in our datasets and bring us a step closer to a CLEAN data!
You’re an analyst for professional basketball team Dallas Mavericks and You need a Player Roster of your Basketball players having CLEAN data for analysis purposes. Someone recently passed you a player roster, great! you have data!
Some of the Data about Height of players just doesn’t look right! A player with named Dominique Jones had height of just 5 inches.
That’s just not right. So for now, you may clean this data-set manually by looking up right values but wouldn’t it be great if you had a Data Cleaning solution that flags unrealistic values for your review every time you get such data-sets? So the analyst decided to build a solution in Data Quality Services – Here are the steps:
Steps to create a Knowledge Base in DQS and clean the invalid values:
Now, First let’s create the Knowledge Base:
You decide to create a rule that checks for player heights that’s less than 5ft and greater than 7ft 6inches for your review. So here’s what the Knowledge Base would look like:
Now after the Knowledge base (KB) has been published, it’s time to use this KB for a Data Quality Project.
Note: This KB can be re-used for creating KB’s for other teams too.
Here’s the result of running the results on our Data – Set:
Note that it says that height has 3 invalid records. Let’s correct them!
In the next tab, you can see the invalid records:
You can correct this invalid values one by one and then approve them:
After that in the next steps, you can see the results – I have highlighted records that we cleaned!
And here’s our clean data-set:
Yay! An analyst of a professional basketball team was able to quickly clean a data-set. And he can re-use the knowledge base he had built to clean other data-sets too in the future! Isn’t that nice?
In this post, we saw how to clean unrealistic or invalid records in a data set by using domain rules in SQL Server 2012 Data Quality Services.