Achievement Unlocked: Tableau Desktop 8 Qualified Associate!

Standard

To test my Tableau knowledge, I attempted the Tableau product certification and got the “Tableau Desktop 8 Qualified Associate” certificate.

Tableau 8 Qualified associate Certificate paras doshi

 

SQL Server Query Fundamentals: A Simple example of a Query that uses PIVOT:

Standard

Problem:

Convert the following source data into a schema shown below:

SQL SERVER TSQL PIVOTSolution:

Here’s the code that uses PIVOT function to get to the solution, please use this as a starting point.

Note the use of aggregation function avg – this will depend on the requirement. In the example, the Test_value need to be average if more than one tests were performed.

[code language=”SQL”]

— source data
SELECT [Product_ID],[Test_Desc],[Test_Val] FROM [dbo].[Address]
go

— Destination data using PIVOT function
select * from [dbo].[Address]
pivot( avg(test_val) for test_Desc IN (Test1,Test2,Test3,Test4,Test5))
as Tests

[/code]

Back to basics: continuous Vs. Discrete variables and their importance in Data Visualization.

Standard

Take a look at the following chart, do you see any issues with it?

month trend chart line chart string to date

Notice that the month values are shown as “distinct” values instead of shown as a “continuous” values and it misleads the person looking at the chart.  Agree? Great! You already know based on your instincts what continuous and discrete values are, it’s just that we will need to label what you already know.

In the example used above, the “Date & Time” shown as a “Sales Date” is a continuous value since you can’t never say the “Exact” time that the event occurred…1/1/2008 22 hours, 15 minutes, 7 seconds, 5 milliseconds…and it goes on…it’s continuous.

But let’s say you wanted to see Number of Units Sold Vs Product Name. now that’s countable, isn’t it? You can say that we sold 150 units of Product X and 250 units of product Y. In this case, Units sold becomes discrete value.

The chart shown above was treating Sales Date as discrete values and hence causing confusion…let’s fix it since now you the difference between continuous and discrete variables:

Statistics Discrete Continuos Variable Data Visualization

Conclusion:

To develop effective data visualizations, it’s important to understand the data types of your data. In this post, you saw the difference between continuous and discrete variables and their importance in data visualization.

SSIS: Using Data Profiling Task to check the candidate key profile of unknown data source(s)

Standard

As a part of Business Intelligence projects, we spend a significant amount in extracting, transforming and loading data from source systems. So it’s always helpful to know as much as you can about the data sources like NULLS, keys, statistics among other things. One of the things that I like to do if the data is unknown is to make sure that I get the candidate keys correct to make sure the key used can uniquely identify the rows in the data. It’s really helpful if you do this upfront because it would avoid a lot of duplicate value errors in your projects.

So here’s a quick tutorial on how you can check the candidate key profile using data profiling task in SSIS, You need to perform two main tasks:
1. Generate the xml file using the Data profiling task in SSIS
2. View the content of the xml file using the Data Profile Viewer Tool or using the Open Profile Viewer option in the Data Profiling task editor in SSIS.

Here are the steps:
1a. Open SQL Server Data Tools (Visual Studio/BIDS) and the SSIS project type
1b. Bring in Data Profiling Task on Control Flow
1c. Open the Data Profiler Task editor and configure the destination folder that the tasks uses to create the XML file. You can either create a new connection or use an existing one. If you use an existing connection, make sure that you are setting the OverwriteDestination property to True if you want the file to be overwritten at the destination.

1 SSIS Data Profiling Task Data Cleaning Candidate Key

1d. Click on Quick Profile to configure the data source for the data profiler task

2 SSIS Data Profiling Task Data Cleaning Candidate Key

1e. In the quick profile form, you’ll need to select the connection, table/view and also specify what you to need to computer. For candidate key profile, make sure that the candidate key profile box is checked.

3 SSIS Data Profiling Task Data Cleaning Candidate Key

1f. Run the Task and a XML file should be placed at the destination you specified in step 1C.

Now, It’s time to view what profiler captured.

2a. you can open “Data Profile Viewer” by searching for its name in the start button.

4 SSIS Data Profiling Task Data Cleaning Candidate Key

2b. once it opens up, click on open and browse to the xml file generated by the data profiling task.

5 SSIS Data Profiling Task Data Cleaning Candidate Key

2c. once the file opens up, you can the candidate key profiles.

6 SSIS Data Profiling Task Data Cleaning Candidate Key

2d. Alternatively, You can also open the data profile viewer from the “Data Profiling Task” in SSIS. Go to the Editor > Open Profile Viewer:

7 SSIS Data Profiling Task Data Cleaning Candidate Key

Conclusion:
In this post, you saw how to profile data using the Data Profiling Task in SSIS.

PASS Business Analytics VC: 7 Ideas on Encouraging Advanced Analytics by Mark Tabladillo #sqlpass

Standard

Thu, Jul 17, 2014 12:00 PM – 1:00 PM EDT


Abstract:
Many companies are starting or expanding their use of data mining and machine learning. This presentation covers seven practical ideas for encouraging advanced analytics in your organization.

Bio:
Mark Tabladillo is a Microsoft MVP and SAS expert based in Atlanta, GA. His Industrial Engineering doctorate (including applied statistics) is from Georgia Tech. Today, he helps teams become more confident in making actionable business decisions through the use of data mining and analytics. Mark provides training and consulting for companies in the US and around the world. He has spoken at major conferences including Microsoft TechEd, PASS Summit, PASS Business Analytics Conference, Predictive Analytics World, and SAS Global Forum. He tweets @marktabnet and blogs at http://marktab.net.

REGISTER HERE: bit.ly/PASSBAVC071714

hope to see you there!

Paras Doshi
Business Analytics Virtual Chapter’s Co-Leader

Business Intelligence Dashboard project for a Business Leader

Standard

Business Goal:

Design and Develop a Business Leader Dashboard to keep an eye on the health of multiple business units under his leadership.

In other words,
Dashboard should provide an one-stop shop for executives to monitor the health of their business unit(s). Its analogous to a car driver’s dashboard that helps monitor important performance indicators that they need to focus on while driving the car while making sure the driver get alerted for things such as “engine check” and “oil levels”. Dashboards uses state-of-the-art features like Key performance indicators (KPI’s), interactive data visualizations and drill down capability to create an immersive user experience for an executive.

Technical Summary:

– Work with the Business Leader to identify key metrics he needed to see on the dashboard to keep an eye of he health of the business units.

– Work with the IT leaders of each business units to map data available to come up with (consistent) formula to get the metrics needed by business leader

– Develop the Dashboard. (Built iteratively by making sure to have two checkpoint meetings with business leader and working with business analysts to make sure the data is right)

– Develop drill down reports for each metric for each business to see detailed data plus trends.

Mockup

(I can’t write about role of the business leader or the metrics displayed because of non disclosure agreements. so this mockup may look generic but it’s intended to be this way)

Business Leader Dashboard

TSQL – Quick note about numeric data type to solve “Arithmetic overflow error”

Standard

Problem:

You are working on a query where you are trying to convert source data to numeric data type and you get an “Arithmetic overflow error”.

Solution:

Let’s understand this with an example:

Here’s the source data: 132.56000000 and you want to store just 132.56 so write a query that looks like:

cast([source language=”column”][/source] as numeric(3,2)) as destination_column_name

and after you run the query its throws an error “Arithmetic Overflow Error” – so what’s wrong?

The issue is that you incorrectly specified the precision and scale – by writing the query that says numeric(3,2) you are saying I want 3 data places with 2 on the right (after decimal point) which leaves just 1 place for left.

what you need to write is numeric(5,2) – and this will have 2 places on the right and leaves 3 places for left.

so after you run this, it shouldn’t complain about the arithmetic overflow error. you just need to make sure that the precision and scale of the numeric data type is correct.

Conclusion:

In this post, you saw an example of how to correctly use the precision and scale in the numeric data type and that should help you solve the arithmetic overflow errors.