DataStage has provide many powerful plugin tools for data inergration and loading purpose into datawarehouse. In this blog I am giving a brief overview of checksum stage and present you the do's and dont's of this stage.
Checksum Stage : This stage is used to calcuate the unique checksum of a given string or given set of columns. This stage can be used to generate a unique identifier for a composite key or can be used for detection of changes between source and target set of records during CDC process while loading to datawarehouse.
Do's:
1) Ensure the column name specfication between the source and target at the same. The order of column need not be the same , but the spelling on the column should be the same.
As the checksum stage will auto align the columns in alphabetical ascending order it is very important that the spelling of the column names be the same between source and target.
2) Ensure the column lenght and scale are same between the source and target to generate the same unique identifier.
3) Ensure the column exclude or include to be same between both the source and taget columns.
4) For better debugging purpose always use a option Buffer Output Column Name to store the buffer value for identifying the columns and data used to build the cheksum.
Dont:
If using checksum stage in a job with run time column propagation , it is advised not use the column exlucusion as the maintainability of the jobs will be difficult for future changes. Inclusion of column is a best approach for better maintainability in such scenario. Vice versa for non run time column propagation columns.
Avoid performing a checksum on a existing checksum field as this will not produce the same result always.
Let me know your view and toughts abt this post ..
Checksum Stage : This stage is used to calcuate the unique checksum of a given string or given set of columns. This stage can be used to generate a unique identifier for a composite key or can be used for detection of changes between source and target set of records during CDC process while loading to datawarehouse.
Do's:
1) Ensure the column name specfication between the source and target at the same. The order of column need not be the same , but the spelling on the column should be the same.
As the checksum stage will auto align the columns in alphabetical ascending order it is very important that the spelling of the column names be the same between source and target.
2) Ensure the column lenght and scale are same between the source and target to generate the same unique identifier.
3) Ensure the column exclude or include to be same between both the source and taget columns.
4) For better debugging purpose always use a option Buffer Output Column Name to store the buffer value for identifying the columns and data used to build the cheksum.
Dont:
If using checksum stage in a job with run time column propagation , it is advised not use the column exlucusion as the maintainability of the jobs will be difficult for future changes. Inclusion of column is a best approach for better maintainability in such scenario. Vice versa for non run time column propagation columns.
Avoid performing a checksum on a existing checksum field as this will not produce the same result always.
Let me know your view and toughts abt this post ..
Thanks for posting it in a simple language and would appreciate if you could add few examples.
ReplyDeleteThank you for going through the article and providing a feed back. Work in progress to generate few examples.
Delete