In this post, I’ll show you how composite domains can help you create cross domain rules in Data Quality Services.
Scenario:
You have a data set of employee name, employee category and yearly salary. you want to validate the value in the yearly salary column based on the employee category. Here are the business rules:
Note: for the purpose of the demo, every number is a dollar.
Now, the rule in the Table can be read as:
If employee category is A then yearly salary should be greater than 100000 and less than 200000.
Note: I have kept it simple for demo purposes.
Now here is our Data-Set before we set out to validate it:
Employee Name | Employee Category | Yearly Salary |
Jon V Yang | A | 127000 |
Eugene L Huang | B | 90000 |
Ruben Torres | C | 83000 |
Christy Zhu | D | 70000 |
Elizabeth Johnson | A | 90000 |
Julio Ruiz | C | 65000 |
Janet G Alvarez | D | 43000 |
Marco Mehta | B | 81000 |
*Names are taken from Adventure works database. The values in the names and salary column are purely fictional.
Solution:
It’s just an overview, It’s not covered in step by step fashion:
1. Create a KB > created three domains: Employee Category, Employee Name and Yearly Salarly
2. created a composite domain:
3. Under Composite Domain (CD) Rules Tab:
I started out with defining the rules for category A:
And I completed w/ specifying business rules for all four categories
4. Published KB
5. Created a New DQS project > Selected the KB created above
6. Selected the data source > Mapped domains
7. I also selected from the list of selected composite domains:
8. After seeing the cleaning statistics, I switched to the invalid tab to see the records that didn’t match the record:
9. So by now, we have identified records that do not match the rules. A data steward can now correct them if he/she wants to or leave them as it is. Notice the Approve/reject check boxes.
Note that: Not only can you update the yearly salary but you can also update the employee category. So if you think that the employee has been wrongly categorized, you can change that.
10. After this, you can export the data-set which has records that match the business rules and the data-set would be ready to be consumed!
Conclusion:
In this post, we saw how to create cross domain rules using composite domains w/ an example of Employee Category and Yearly Salary.
Related articles
- Using DQS: Cleansing complex data using composite domains (Microsoft)
- SQL Server 2012 Data Quality Services Term based Relation’s in action! (parasdoshi.com)
- How to clean address records using third-party reference data-sets in Data Quality Services? (parasdoshi.com)
- Difference between Term based relations and Domain values in SQL server 2012 Data Quality Services (parasdoshi.com)
- How to clean records using Regular Expressions in Data Quality Services? (parasdoshi.com)
- How to standardize data using Data Quality Services? (parasdoshi.com)
- How to detect unrealistic or invalid values using Data Quality Services? (parasdoshi.com)
0 thoughts on “Data Quality Service’s Composite Domains in action!”