Productivity Tip: Learn to Comment/Uncomment SQL code using shortcuts

Standard

I spend a lot of time writing SQL code — and as a reader of this blog, You might be in the same boat. So any productivity gains that we could get here could go a long way. On that note, here’s a quick productivity tip: Learn to comment/uncomment multiple lines of SQL code using keyboard shortcut.

COMMENTS SQL CODE

If you are using SQL Server Management Studio, it’s “CTRL-K followed by CTRL+C” for commenting AND “CTRL+K followed by CTRL+U” for uncommenting.

If you are using some other Data Management Software tool, I am sure you can find it using their HELP section or googling around.

Either ways, these shortcuts go a long way in making you more productive! What is your favorite productivity tip?

SQL: How to get first/last transaction details for each user account?

Standard

Looking at user’s first/last transaction details is a common requirement. So given that you have a “user_transaction_details” table which looks something like:

Transaction_id | user_id | transaction_date | purchase amount

How would get first/last transaction details for each user account?

Here’s a design pattern:

[code language=”SQL”]

select AllTransactions.user_id,AllTransactions.purchase_amt from user_transaction_details AllTransactions
inner join
(
select user_id, min(transaction_timestamp) as first_transaction_timestamp from user_transaction_details
group by user_id
) FirstTransaction
on AllTransactions.user_id = FirstTransaction.user_id and AllTransactions.transaction_timestamp = FirstTransaction.first_transaction_timestamp

[/code]

To get the last transaction details for each user account, replace min function with max and that should be it.

Note: Depending on your data model and how you used it in the join, it might be that there would be multiple rows marked as “first/last” transaction and so would want to spend some time figuring out how to deal with these multiple row situation especially if you’re planning to further join this data.

Conclusion:
In this post, I shared a design pattern to write a SQL query to get first/last transaction details for each user account type.

Question for you:
How would you solve this problem? Share you query in the comments section!

Business Metric #4 of N: “Leads” (marketing)

Standard

Summary:

In this post, we will discuss about a common metric used by Sales & Marketing teams called “leads”.

Description:

In simple terms,

Leads = number of individuals (or companies) that have expressed an interest in your goods or services.

why do we want to measure this?

For a business to grow, it’s important that the sales & marketing department work to make sure that there is a growing interest in company’s goods or services. It’s important to track this metric to make sure that it’s a positive upward trend!

Word of caution: It’s important to also note that this metric on its own can be misleading. It might be a good idea to also track “conversion ratios” (converting leads or potential customers into actual customers) to make sure that high-quality leads are being generated.

where can you get this data?

Depending on the channel that you use to capture potential customer’s information & the technology maturity of the company, it varies. I’ve seen CRM systems used to report “leads” data and I’ve also seen manual excel files that are used to generate leads report.

Are there any sub-categories?

Yes, it’s usually subdivided into 1) Marketing Qualified Leads and 2) Sales Marketing Leads.

usually, Marketing Qualified lead (MQL) is someone who has shown interest in your product or service but you don’t know if they fulfill your qualifications to buy your products or services. out of all MQL’s, those leads that qualify your criteria and are identified are someone who is ready to buy your products or services becomes your Sales Qualified Lead (SQL) and sales department get’s ready to engage with these leads to make them an actual customer.

Marketing Funnel Sales Qualified Lead

Conclusion:

In this post, we saw a high level overview of a business metric used in marketing and sales called “leads”.  As mentioned earlier, don’t report on just “leads” – it can be misleading for marketing & sales executives since upward trend in number of leads doesn’t necessarily result in increased sales unless the quality of new leads is same or better. Marketing and sales executives would really appreciate any context  (example: conversions) that you can provide to their “leads” report. I hope that helps!

How to create an Average Aggregation in SQL Server Analysis services?

Standard

Problem:

How do create a measure that does an average over a field from fact table? You can’t find it the “usage” property while trying to create a new measure:

SQL Server Analysis Services Average Aggregation

Solution:

Before i show you the solution, I want you to know that this is a Level 100 solution to get you started – so depending on the complexity of your cube the calculated measure that you are about to create may or may not perform well – if it does not perform well, you might have to dig a little deeper and here’s one blog post to get you started: URL

OK, back to topic! Here are the steps.

SCENARIO: you need average of Sales Amount.

1. Create a SUM OF SALES AMOUNT measure

Steps: Open cube > Cube Structure > Right click on Measure Group > New Measure > Usage: “SUM” > Source Table: Pick your Fact Table. In this case let’s say it’s Fact Sales > Source Column: In this case, lets say it’s SALES AMOUNT

2. Create a COUNT OF SALES measure (important: row count vs. non empty count – this is not a developer’s choice, a business user needs to define that)

Steps: Open cube > Cube Structure > Right click on Measure Group > New Measure > Usage: count of rows OR count of non empty values (again this is not developer’s choice, a business user needs to define this) > Source Table: Pick your Fact Table. In this case let’s say it’s Fact Sales > Source Column: In this case, lets say it’s SALES AMOUNT

3. Create a Calculated Measure that equals (SUM OF SALES/COUNT OF SALES)

3a. Switch to Calculations section > create a new calculated member:

SSAS Analysis services new calculated measure

3b. Complete Name, Format String & Associated Measure Group. For the Expression, use the following expression. Please use this as a starting point for your measure:

[code language=”SQL”]
IIF([measures].[COUNT OF SALES]=0,0,[measures].[SUM OF SALES AMOUNT]/[measures].[COUNT OF SALES])
[/code]

4. Before you test it, if you don’t need the SUM OF SALES AMOUNT and COUNT OF SALES measures than don’t forget to hide them!

Conclusion:

In this post, you saw how to define a measure with average aggregation is SSAS.

SQL Server Query Fundamentals: A Simple example of a Query that uses PIVOT:

Standard

Problem:

Convert the following source data into a schema shown below:

SQL SERVER TSQL PIVOTSolution:

Here’s the code that uses PIVOT function to get to the solution, please use this as a starting point.

Note the use of aggregation function avg – this will depend on the requirement. In the example, the Test_value need to be average if more than one tests were performed.

[code language=”SQL”]

— source data
SELECT [Product_ID],[Test_Desc],[Test_Val] FROM [dbo].[Address]
go

— Destination data using PIVOT function
select * from [dbo].[Address]
pivot( avg(test_val) for test_Desc IN (Test1,Test2,Test3,Test4,Test5))
as Tests

[/code]

SSIS: Using Data Profiling Task to check the candidate key profile of unknown data source(s)

Standard

As a part of Business Intelligence projects, we spend a significant amount in extracting, transforming and loading data from source systems. So it’s always helpful to know as much as you can about the data sources like NULLS, keys, statistics among other things. One of the things that I like to do if the data is unknown is to make sure that I get the candidate keys correct to make sure the key used can uniquely identify the rows in the data. It’s really helpful if you do this upfront because it would avoid a lot of duplicate value errors in your projects.

So here’s a quick tutorial on how you can check the candidate key profile using data profiling task in SSIS, You need to perform two main tasks:
1. Generate the xml file using the Data profiling task in SSIS
2. View the content of the xml file using the Data Profile Viewer Tool or using the Open Profile Viewer option in the Data Profiling task editor in SSIS.

Here are the steps:
1a. Open SQL Server Data Tools (Visual Studio/BIDS) and the SSIS project type
1b. Bring in Data Profiling Task on Control Flow
1c. Open the Data Profiler Task editor and configure the destination folder that the tasks uses to create the XML file. You can either create a new connection or use an existing one. If you use an existing connection, make sure that you are setting the OverwriteDestination property to True if you want the file to be overwritten at the destination.

1 SSIS Data Profiling Task Data Cleaning Candidate Key

1d. Click on Quick Profile to configure the data source for the data profiler task

2 SSIS Data Profiling Task Data Cleaning Candidate Key

1e. In the quick profile form, you’ll need to select the connection, table/view and also specify what you to need to computer. For candidate key profile, make sure that the candidate key profile box is checked.

3 SSIS Data Profiling Task Data Cleaning Candidate Key

1f. Run the Task and a XML file should be placed at the destination you specified in step 1C.

Now, It’s time to view what profiler captured.

2a. you can open “Data Profile Viewer” by searching for its name in the start button.

4 SSIS Data Profiling Task Data Cleaning Candidate Key

2b. once it opens up, click on open and browse to the xml file generated by the data profiling task.

5 SSIS Data Profiling Task Data Cleaning Candidate Key

2c. once the file opens up, you can the candidate key profiles.

6 SSIS Data Profiling Task Data Cleaning Candidate Key

2d. Alternatively, You can also open the data profile viewer from the “Data Profiling Task” in SSIS. Go to the Editor > Open Profile Viewer:

7 SSIS Data Profiling Task Data Cleaning Candidate Key

Conclusion:
In this post, you saw how to profile data using the Data Profiling Task in SSIS.

PASS Business Analytics VC: 7 Ideas on Encouraging Advanced Analytics by Mark Tabladillo #sqlpass

Standard

Thu, Jul 17, 2014 12:00 PM – 1:00 PM EDT


Abstract:
Many companies are starting or expanding their use of data mining and machine learning. This presentation covers seven practical ideas for encouraging advanced analytics in your organization.

Bio:
Mark Tabladillo is a Microsoft MVP and SAS expert based in Atlanta, GA. His Industrial Engineering doctorate (including applied statistics) is from Georgia Tech. Today, he helps teams become more confident in making actionable business decisions through the use of data mining and analytics. Mark provides training and consulting for companies in the US and around the world. He has spoken at major conferences including Microsoft TechEd, PASS Summit, PASS Business Analytics Conference, Predictive Analytics World, and SAS Global Forum. He tweets @marktabnet and blogs at http://marktab.net.

REGISTER HERE: bit.ly/PASSBAVC071714

hope to see you there!

Paras Doshi
Business Analytics Virtual Chapter’s Co-Leader

TSQL – Quick note about numeric data type to solve “Arithmetic overflow error”

Standard

Problem:

You are working on a query where you are trying to convert source data to numeric data type and you get an “Arithmetic overflow error”.

Solution:

Let’s understand this with an example:

Here’s the source data: 132.56000000 and you want to store just 132.56 so write a query that looks like:

cast([source language=”column”][/source] as numeric(3,2)) as destination_column_name

and after you run the query its throws an error “Arithmetic Overflow Error” – so what’s wrong?

The issue is that you incorrectly specified the precision and scale – by writing the query that says numeric(3,2) you are saying I want 3 data places with 2 on the right (after decimal point) which leaves just 1 place for left.

what you need to write is numeric(5,2) – and this will have 2 places on the right and leaves 3 places for left.

so after you run this, it shouldn’t complain about the arithmetic overflow error. you just need to make sure that the precision and scale of the numeric data type is correct.

Conclusion:

In this post, you saw an example of how to correctly use the precision and scale in the numeric data type and that should help you solve the arithmetic overflow errors.

SSIS – How to use Execute SQL Task to assign value to a variable?

Standard

Problem:

How to use Execute SQL Task in SSIS to assign value to a variable?

Solution:

This is a beginner level post so I’ll show you how you can use Execute SQL Task to assign a value to a variable. Note that variables can also be given full result set. With that said, here are the steps:

1. Create the query against the source system

Example: ((Note the column name, this will be handy later!)

1 Execute SQL Task SSIS Query

2. Open SSIS Project > Create the variable

Example

Variable SSIS Create Steps3. Now, drag a Execute SQL Task to Control Flow. Rename it. And go to Edit. Configure SQL Statement Section

Execute SQL Statement SSIS4. Now, since we want to store a value to the variable, change the Result Set property to Single Row

Single ROW SQL Statement Server SSIS5. One last step, go to result set section and map Result Name (remember the column name from #1?!) with Variable Name:

Result Set SSIS Execute SQL TaskThat’s it! Related article: How to see value of variable during Run Time?

Conclusion:

In this post, you saw how to use Execute SQL Task in SQL server integration services to assign a value to a variable.

SQL server Integration services: How to solve “The value violated the integrity constraints for the column” error?

Standard

problem:

you are working on an SSIS package to load a table from a source system and you get an error “The value violated the integrity constraints for the column error” – how do you solve it?

solution:

one the things that the error message should also tell you would be column name. What you want to do is check the table definition of the destination table for any integrity constraints like NOT NULL or PRIMARY KEY. Now once you have that information, go back to your source and figure out if it’s trying to add NULL values in a column that has NOT NULL integrity constraint. Or may be ETL logic is trying to insert duplicate value to the column that has primary key constraint.

Also, the don’t alter the destination table to accept NULL’s or remove integrity constraint. You want to put a logic in your ETL OR fix the data integrity at source. You can use TSQL functions like NULLIF to handle NULL values while querying source systems.

Conclusion:
In this post, we saw how to solve the “The value violated the integrity constraints for the column” error in SSIS.