How to standardize data using Data Quality Services?

Standard

I was playing with Data Quality Services when I though that it would be nice if I could implement what I had learned and create a quick demo that shows how to standardize data using the technology. So here it goes:

Note: This is not a step by step Guide – I am assuming that you are aware of different pieces of DQS. I am going to post a brief outline of how did I go about implementing a quick demo.

Technology used:

1) SQL Server 2012 Data Quality Services

2) SQL Server Integration services to show the DQS cleansing  component in action.

Problem:

We want to standardize to Gender to have only two values M & F – but now data coming from two files creates data inconsistency issues:

Two files that look like this:

File 1: (Excel)

1 sql server data quality services data source 1 excel file

File 2: (CSV)

2 sql server data quality services data source 2 csv file

Solution:

Let’s see how SQL server 2012’s data quality services would help us solve this problem:

1) I created a Knowledge base with a domain Gender and created domain values that looks like this:

3 data quality services domain management values sql server

2) Tested the Knowledge base quickly by creating a Data Quality Project. It worked!

3) Switched to SSIS and created a Data Flow which uses following components:

4 data quality services sql server integration services

What would be interesting to you to see mapping tab in the DQS cleansing transformation component, so here it is:

5 data quality SSIS data cleaning transformation

5) And let me show you the Target File – note that you can choose to NOT have Gender_Source, Gender_Status & Record Status. The column of interest is Gender_Output

6 standardized output SSIS gender data quality services

That’s about it for this post. I hope this helps!

what are the end – to – end processes for developers to build out a Master Data Services 2012 solution?

Standard

I recently completed an end to end tutorial that taught me basics of Master Data Services 2012. And I thought I would take what I’ve learned and create a diagram for reference that helps developers see end – to – end processes that are involved in building out a SQL Server Master Data services 2012 solution. So here it is:

SQL Server 2012 Master data services steps

Resource: Learn to build an Enterprise Information Management system using Data Quality Services, Master Data Services and SQL Server Integration Services

Standard

[UPDATE 29 Dec 2012: One more whitepaper that’s on the same subject and should be read after the tutorial: Cleanse and Match Master Data by Using EIM ]

Here’s the resource: Tutorial: Enterprise Information Management using SSIS, MDS, and DQS Together

Here’s what you’ll learn:

– Learn to use Data Quality client to create DQS Knowledge Base, cleanse data in an excel data, remove duplicated data from the excel file

DQS client domain values clean data data quality services

– Learn Use MDS add-in for Excel, store the cleansed and matched data in MDS

Master data services add in for excel

– Learn how to use the MDS Web UI:

SQL Server 2012 master data services MDS Web UI

– Learn to Automate the process of receiving input data, cleaning and matching it and storing the master data into MDS via SSIS!

SSIS automate DQS MDS data quality component

That’s about it! check out the Tutorial to learn about all the DQS, MDS & SSIS goodness!

 

SQL Server Data Quality Services: How to test Domain Rules in the DQS client.

Standard

In the domain management activity of Data Quality Services, you can test the domain rules on sample data while creating the Knowledge base . Here are more details:

1. Create a KB > Select “Domain Management as activity” > create a Domain > Go to the Domain Rules Section of a Domain.

2. For the purpose of this blog-post, I am going to create a domain rule named “Email Validation” which requires the Values to End With @ParasDoshi.com

3. Here’s the Domain Rule. Now click on “Run the selected Domain Rule on Test data” icon:

domain management activity of Data Quality Services, you can test the domain rules on sample data while creating the Knowledge base

4. Now add some test data and then click on “Test the domain rule on all items”.

data quality services sql server test the domain rule

Thus, This way you can build and test the Domain Rule while creating the Knowledge Base.

Conclusion:

In this blog-post, we saw how to test the Domain Rule in the Data Quality Services client of SQL Server 2012

How to Delete a Knowledge Base in SQL Server 2012 Data Quality Services?

Standard

A quick post on step – by – step to delete a Knowledge Base (KB) in SQL Server 2012 Data Quality Services (DQS):

1. Open the Data Quality Client

2. Click on “Open Knowledge Base”

SQL Server 2012 Data Quality Services delete a Knowledge Base

3. Select the KB > Right Mouse Click > Delete.

Note that it also allows you to Rename, Open and see properties of a Knowledge Base via this Knowledge Base Management UI.

SQL Server 2012 Data Quality Services knowledge base management

That’s about it for this post. I took me five minutes to figure how to delete a KB and so I thought I would document it if someone is not able to find this option as quick as they might have thought. Happy Data Cleaning!

How I think of what is KnowledgeBase in SQL Server 2012 Data Quality Services

Standard

This Blog Post is meant to share how I think of what is KnowledgeBase in SQL Server 2012 Data Quality Services (DQS) – In my mind, Knowledge Base (KB) captures:

1. WHAT needs cleaning

2. HOW to clean what needs cleaning.

Let’s dive a little deeper, In DQS – a Knowledge Base let’s you do three things: Knowledge Discovery, Domain Management & Matching Policy.

1. Knowledge Discovery:

This activity helps you find “WHAT” needs cleaning. DQS has inbuilt algorithms that helps in analyzing errors, inconsistencies and data quality issues in the sample data-set.

2. Domain management:

This activity helps in defining the rules that will be applied to create “HOW” to clean the data.

3. Matching Policy:

This activity helps in identifying “WHAT” needs to be De-Duplicated (De-Dup) and then it goes about helping create the “HOW” to De-DUP the data.

Conclusion:

In this short blog post, I shared how I think of what is Knowledge Base in SQL Server 2012 Data Quality Services. And here’s the official resource if you want to continue learning: DQS Knowledge Bases and Domains