I was playing with Data Quality Services when I though that it would be nice if I could implement what I had learned and create a quick demo that shows how to standardize data using the technology. So here it goes:
Note: This is not a step by step Guide – I am assuming that you are aware of different pieces of DQS. I am going to post a brief outline of how did I go about implementing a quick demo.
Technology used:
1) SQL Server 2012 Data Quality Services
2) SQL Server Integration services to show the DQS cleansing component in action.
Problem:
We want to standardize to Gender to have only two values M & F – but now data coming from two files creates data inconsistency issues:
Two files that look like this:
File 1: (Excel)
File 2: (CSV)
Solution:
Let’s see how SQL server 2012’s data quality services would help us solve this problem:
1) I created a Knowledge base with a domain Gender and created domain values that looks like this:
2) Tested the Knowledge base quickly by creating a Data Quality Project. It worked!
3) Switched to SSIS and created a Data Flow which uses following components:
What would be interesting to you to see mapping tab in the DQS cleansing transformation component, so here it is:
5) And let me show you the Target File – note that you can choose to NOT have Gender_Source, Gender_Status & Record Status. The column of interest is Gender_Output
That’s about it for this post. I hope this helps!
Related articles
- How to Delete a Knowledge Base in SQL Server 2012 Data Quality Services? (parasdoshi.com)
- How I think of what is KnowledgeBase in SQL Server 2012 Data Quality Services (parasdoshi.com)
- Guest Blog: How we use Fuzzy Lookup add-in in our company to solve data inconsistency problems: (parasdoshi.com)
- Master Data Services: Analogy to remember what are Models, Entities and Attriutes (parasdoshi.com)
0 thoughts on “How to standardize data using Data Quality Services?”