Tuesday, January 27, 2015

Distributed Transaction stage -- A handy tool for Real Time data processing

This post is basicaly for people who are interested in Realtime processing messages into data processing systems using IBM DataStage capabilities.

DataStage version earlier than 8.5 does not have the guarantee delivery of data from a source to a target system. Each instance of a stage manages its own transaction, which is independent of other stages. There are no control signals sent between stages to co-ordinate how transactions are started and committed. However, some user scenarios require the data to remain in the source system until it is written to
a target system. ( Ex: Messages from MQ or JMS ).

From version 8.5 and later distributed transaction stage has provided these data delivery gurantee between Source to Target.

Here is the transaction flow described for your reference:

When the job is run, the following actions take place:
1. A local transaction is started with the  connector.
2. The target is updated.
3. If the updates are successful, the local transaction is committed on reaching
end-of-wave.
4. If the local transaction is successfully committed:
a. A WebSphere MQ transaction is started.
b. The messages about the current transaction are deleted from the source or
work queue.
c. The WebSphere MQ transaction is committed.
5. If actions in Step 3 or Step 4 are not successful, and a reject queue is defined:
a. A WebSphere MQ transaction is started.
b. Messages about the current transaction are moved from the source or work
queue to the reject queue.
c. The WebSphere MQ transaction is committed.


Conns:

1) Required write access to MQ's (either source or work queues ).
2) Processing of messages is slowed based on response from MQ server. ( especially network related ).

Environment Variables for consideration

CC_DTS_COMMIT_ON_EOF  : specify whether Distributed Transaction stage
commits on the end of data.

CC_TRUNCATE_NSTRING_WITH_NULL  : to truncate string data that includes the string 0x00.

CC_USE_EXTERNAL_SCHEMA_ON_MISMATCH : to use an external schema rather than a design
schema when the schemas do not match
 

No comments:

Post a Comment