Australian Address Quality Framework
The need to provide quality data with in any organization is becoming mandatory to perform analytics and reporting needs. An organization with rich quality of data can get better insights into data and find unknown location specific patterns which can increase revenue for the organization.
But the question comes on how to implement this solutions. An organization need to have enough wisdom to take wise decision on the implementing Data Quality solutions. This article will provide an outline of implementation methodology to either improve existing data quality or capture Quality data on an ongoing basis. This article only outlines the methodology for ADDRESS data quality issue , but can be extensible to any data points.
Outlining below approach from my experience with australian based organization, but can be extendable for any organization over the globe.
Any organization which would like to improve their data quality with Australian address can follow below approach to improve their quality.
Step1 :
Identify all the systems within the organization where address data is captured. ( Manual or electronic ).
Step 2 :
Australian Post is the most trustable and accurate data to verify an address quality.
Engage with different third party vendors to extract Address specific data. ( AMAS (Address Matching Approval System)
Raw address file providers:
http://auspost.com.au/business-solutions/raw-address-data.html
Step 3:
Define the Match Strategy required from this solutions.
Ex:
- Find only Exact Matches.
- Find Exact Matches and relevant match , if there is any alternate street name.
- Find all extract Matches and Map possible values for incorrect Matches etc ..
- Fuzzy Matching Vs Non Fuzzy Matching.
Step 4:
Define data testing process. ( Availability of Address Match data from third party , Availability of Current system data , Data Stewards time to verify and approve matching data etc .. )
Step 5:
Software Considerations for Matching data. ( ETL tools vs Self coded tools etc .. )
Step 6:
Data Quality Process within organization.
Ex: 1. Who is responsibility to certify correct data.
2.Who will take corrective action , in case of incorrect data identification.
3. How will the action be taken. ( using GUI or manual process etc .. )
All these factors go into solution designing. So be wise to answer such question correctly.
Step 7:
Design solution that matches Step 3, 4, 5 and 6.
Detailed Match process Hierarchy
- Standardize source address to its lowest level. All australian address follow the below hierarchy. ( Different ETL tools can help to lower development Cost. I would prefer IBM DataStage QualityStage ).
• Flat/Unit Type; • Flat/Unit Number; • Floor Level Type; • Floor Level Number; • Building Property Name 1; • Building Property Name 2; • Lot Number; • House Number 1; • House Number 1 Suffix; • House Number 2; • House Number 2 Suffix; • Postal Delivery Number; • Postal Delivery Number Prefix; • Postal Delivery Number Suffix; • Street Name; • Street Type; • Street Suffix; • Postal Delivery Type; • Locality; • State; and • Postcode.
2. Perform Exact Match with Supplier provide reference data.
3. Perform Fuzzy Match (Misspelt data , abbreviated data etc .. ).
4. Transform unmatched with Alternate street names.
5. Perform step 2, 3 with alternate street names.
6. Match using Bordering Localities.
7. Match using any specific rules defined by business. ( Tie Breaking Rules ).
8. Assign DPID of each Match.
9. Take appropriate steps as per Step 3 ( Match Strategy ).
FOR ANY CONSULTATION SERVICES YOU CAN CONTACT ME @ cevvavijay@gmail.com
No comments:
Post a Comment