From an ETL architecture design point of view, gathering
requirements is the key aspect of a successful and sustainable design
for a longer period of time.
Apart from desiging the Dataware House database model , it is quite very important to focus on the below set of requirements up front in-order to overcome the re-work during testing or after implementation phases.
Below are the categories which are descirbed as the key decision makers for any ETL design.
-> Business Needs.
1) what is the information content that business user needs ?
2) How the buiness user would like to drill down and drill across the data.
3) List of KPI's ( Facts across dimensions ) that business would like to report.
-> Compliance
This form an important aspect when design ETL for Financial or Telecommunication sector doe to external regulatory requirements.
1) List of data points subjected to Legal and Compliance. ( ex: Credit Card number , Client information etc) 2) Data transformation steps for which you must maintain the “chain of custody” showing and proving
that final reports were derived from the original data delivered from your data sources.
3) Security related aspects for data.
4) Archival mechanism and expected usable life time of those archives.
--> Data Quality
1) List all data Sources and identify those data elements whose quality is known to be unacceptable.
2) list whether an agreement has been reached with the source systems to correct the data before extraction.
3) Consolidate all the data elements which are discovered during data profiling as they should be continously monitored during ETL process.
--> Security
1) List down all the security aspects from Security Manager.
2) List down all the complaince aspects from compalinance team, as most of them will overlap with Security aspects.
3) Brainstrom with senior management to idetify the list of security aspects the ETL design should cary with it.
4) Most important accessiablity of the data avaliable in the warehouse has to taken care. ( For ex: BI tools accessibility to data , Backup accessibility to user's etc)
--> Data Integration
1) List down all the data source as per business needs querstionaire.
2) Understand the meaning of each data elements from the transcation system owners.
3) Understand the business process to inter-relate common dimensions across various sources, to help business user's to drill-across. ( Conformed Dimensions ).
4) understand the meaning of KPI's and derivation mechanism accors various sources.
5) List down methods to integrate data from each of the sources using Bus matrix representation.
6) Prepate a meta data mapping using Bux Matrix for each of the source to its conformed dimenstions / facts.
--> Data Latency
This is most curcial aspect of designing a ETL design as business needs data on time to make necessary descions.
1) Understand the SLA of data delivery from Datawareshouse to business user's.
2) Understand the avaliability of data from the sources.Decision of batch / Micro-batch or Stream oriented to be finalized.
3) Estimate the data processing time based on the avaliable infrastucture / Complexity of transformation / Network capacity & Avalibility etc ..
4) Recommend the optimum infrastucture capability to achieve SLA.
5) Get an agreement with source system on the SLA of data avaliability for processing. ( Inculding transfer over network ).
--> BI Delivery Interfaces
I would feel that ETL team, working closely with the modeling team, must take responsibility for the content and structure of the data that makes the BI applications simple and fast
1) List all fact and dimension tables that will be directly exposed to your BI tools. This should come directly from the dimensional model specification.
2) List all OLAP cubes and special database structures required by BI tools.
3) List all known indexes and aggregations you have agreed to build to support BI performance.
Reference : The datawarehouse ToolKit , 3rd edition ( Kimball )
Apart from desiging the Dataware House database model , it is quite very important to focus on the below set of requirements up front in-order to overcome the re-work during testing or after implementation phases.
Below are the categories which are descirbed as the key decision makers for any ETL design.
-> Business Needs.
1) what is the information content that business user needs ?
2) How the buiness user would like to drill down and drill across the data.
3) List of KPI's ( Facts across dimensions ) that business would like to report.
-> Compliance
This form an important aspect when design ETL for Financial or Telecommunication sector doe to external regulatory requirements.
1) List of data points subjected to Legal and Compliance. ( ex: Credit Card number , Client information etc) 2) Data transformation steps for which you must maintain the “chain of custody” showing and proving
that final reports were derived from the original data delivered from your data sources.
3) Security related aspects for data.
4) Archival mechanism and expected usable life time of those archives.
--> Data Quality
1) List all data Sources and identify those data elements whose quality is known to be unacceptable.
2) list whether an agreement has been reached with the source systems to correct the data before extraction.
3) Consolidate all the data elements which are discovered during data profiling as they should be continously monitored during ETL process.
--> Security
1) List down all the security aspects from Security Manager.
2) List down all the complaince aspects from compalinance team, as most of them will overlap with Security aspects.
3) Brainstrom with senior management to idetify the list of security aspects the ETL design should cary with it.
4) Most important accessiablity of the data avaliable in the warehouse has to taken care. ( For ex: BI tools accessibility to data , Backup accessibility to user's etc)
--> Data Integration
1) List down all the data source as per business needs querstionaire.
2) Understand the meaning of each data elements from the transcation system owners.
3) Understand the business process to inter-relate common dimensions across various sources, to help business user's to drill-across. ( Conformed Dimensions ).
4) understand the meaning of KPI's and derivation mechanism accors various sources.
5) List down methods to integrate data from each of the sources using Bus matrix representation.
6) Prepate a meta data mapping using Bux Matrix for each of the source to its conformed dimenstions / facts.
--> Data Latency
This is most curcial aspect of designing a ETL design as business needs data on time to make necessary descions.
1) Understand the SLA of data delivery from Datawareshouse to business user's.
2) Understand the avaliability of data from the sources.Decision of batch / Micro-batch or Stream oriented to be finalized.
3) Estimate the data processing time based on the avaliable infrastucture / Complexity of transformation / Network capacity & Avalibility etc ..
4) Recommend the optimum infrastucture capability to achieve SLA.
5) Get an agreement with source system on the SLA of data avaliability for processing. ( Inculding transfer over network ).
--> BI Delivery Interfaces
I would feel that ETL team, working closely with the modeling team, must take responsibility for the content and structure of the data that makes the BI applications simple and fast
1) List all fact and dimension tables that will be directly exposed to your BI tools. This should come directly from the dimensional model specification.
2) List all OLAP cubes and special database structures required by BI tools.
3) List all known indexes and aggregations you have agreed to build to support BI performance.
Reference : The datawarehouse ToolKit , 3rd edition ( Kimball )
Thank you for sharing such a nice and interesting blog with us. We are providing the best services click on below links to visit our website.
ReplyDeleteOracle Fusion HCM Training
Workday Training
Okta Training
Palo Alto Training
Adobe Analytics Training