SQL Server 2012 Data Quality Services Term based Relation’s in action!

Standard

In data quality services, a knowledge base (KB) consists of domains. And domains has: domain rules, term based relations, domain values and reference data. In this post, we’ll see a use case of Term based relations. But before we do that, you can consider reading: Difference between Term based relations and Domain values in SQL server 2012 Data Quality Services

Now, Here’s Term Based Relation’s in action

Scenario:

After data profiling you realize that there are records such as “my Company Inc.” and “my Company Incorporated” – so you set out to automatically find these mismatches in terms inside a value and correct them.

BEFORE cleaned data, the report showed that “my company Inc” revenue is less than that of Google:

Before data cleaning company names and revenues sql server

AFTER cleaned data, the report correctly shows that “my company Inc” revenue is more than that of Google:

After data cleaning company names and revenues sql server

Steps taken to clean data:

(Just an overview, not covered in a step by step fashion)

1. Created the Knowledge Base w/ Two domains Company Names & Revenue

2. Term Based Relations Tab of Company Names domain:

2 term based relations data quality sql server

3. Published the KB > Let’s create a DQS project

4. Mapped the domains:

3. DQS Project Mapping Domain Names

5. DQS cleaned following records:

4 term based relations inc incorporated corrected records

6. Exported the data and created a report out of clean data-set!

Business user is happy :)

conclusion:

In this post, we saw how to correct a word/term within a value in a domain. The example we used was Inc. , Inc and Incorporated . It can be used to correct terms like Jr. and Junior. Sr. and Senior. etc. Things like this are difficult to catch during data entry – But using Term Based Relations, a person who knows the Data can clean it so that it generates correct reports.

After all reports like this are of little to no use, are they? So Let’s NOT create confusing reports anymore.

Please note:

  • The revenue figures shown are just for demo purposes. I pulled up these numbers from Wikipedia. Please don’t make any financial decision based on these reports and if you do, I am not responsible for that.
  • The name “my Company Inc” is a fictional firm. It’s not any firm that I am/was associated with in past, future of present. It’s a fictional name!

And Writing disclaimers like this are NO fun – sucks the joy out of “Thinking out Loud” :)