In this post, we’ll see how to clean address records using third party reference data-sets in Data Quality Services.
You need to perform following steps to do so
Note that they are not step by step procedures, a high level overview of how DQS works to clean address records using 3rd party reference data-sets:
1) Configure Data Quality Services to use reference data sets. Tutorial here: http://msdn.microsoft.com/en-us/library/hh213070.aspx
2) Go to datamarket.azure.com > and I subscribed to “Address check – verify, correct, Geocode US and canadian Addresses Data” ; we’ll see how to use this in next steps.
3) Note that as of now, we can only have refernece data-sets from Azure Data Market. However, the MSDN thread: http://social.msdn.microsoft.com/Forums/hu-HU/sqldataqualityservices/thread/750faef8-dd69-4a71-b0c1-18ca2f93d59d suggests that we’ll have an ability to provide our (private/self-provided) reference data-sets in next service updates. So for now we’ll have to connect to Azure data market for reference data-sets and for the purpose of the Demo, I connected to Melissa Data’s Address Check.
4) Now via DQS client, let’s create a Knowledge Base!
5) I’ve created following domains:
- Address Line
And a composite domain: Full Address which contains domains: Address line, city, state and zip in the composite domains.
6) And for the composite domain Full Address – this is how you configure reference data-sets:
7) After creating the Knowledge Base, start a new DQS project. Here the cleaning happens at the composite domain level and this is a demo of how DQS uses third party reference data-set to classify records as suggested, new, invalid, corrected or correct:
You can see that DQS corrected few records and mentions that the reason in this case was Melissa data’s address verify reference data-set:
That’s about it for this post. I hope it helps.
- How to detect unrealistic or invalid values using Data Quality Services? (parasdoshi.com)
- How to standardize data using Data Quality Services? (parasdoshi.com)
- How I think of what is KnowledgeBase in SQL Server 2012 Data Quality Services (parasdoshi.com)
- How to Delete a Knowledge Base in SQL Server 2012 Data Quality Services? (parasdoshi.com)
0 thoughts on “How to clean address records using third-party reference data-sets in Data Quality Services?”