DataWareHousing Technology: DataStage to DataBase connector Stages

As a general guideline, new projects should give preference to Connector stages, and take advantage of existing Enterprise equivalents in specific cases where these have an edge in terms of performance.

a. ODBC Connector:

The ODBC Connector has the following features:

1. Source, target and lookup context:

The ODBC Connector can be used similar to typical ODBC stage for extracting (in the context of source), for loading (in the context of target) and for lookup (in the context of lookup). The lookup context is similar to the source context, with the difference that the SELECT SQL statements are parameterized. The parameter values are provided dynamically on the input link by the other stages in the job.

2. Reject links:

The ODBC Connector supports a special type of link called reject link. The Connector can be configured to direct the data that it cannot process to the reject link, from which it can be sent to any stage in the job.

3. Passing LOBs by reference:

The Connector allows the option of passing LOBs by reference, rather than by extracting the data and passing it inline into the job flow. When configured to pass LOB values by reference, the ODBC Connector assembles a special block of data, called a locator or reference, that it passes into the job dataflow. Other Connectors are placed at the end of the job dataflow. When the LOB locator arrives at the other Connector stage, the Connector framework initiates the retrieval of the actual ODBC data represented by the reference and provides the data to the target Connector so that it can be loaded into the represented resource. This way it is possible to move LOB data from one data resource to another where the size of the data is measured in megabytes and gigabytes, without having to move the actual data through the job. The drawback is that the data passed this way cannot be altered by the intermediate stages in the job.

4. Arrays:

The Connector supports array insert operations in target context. The Connector buffers the specified number of input rows before inserting them to the database in a single operation. This provides for better performance when inserting large numbers of rows.

5. SQL Builder

The Connector user uses the SQL Builder tool to design the SQL statements.

6. Pre/post run statements:

The Connector provides a mechanism for executing SQL statements that typically are used to initialize the database objects (for example to create a new table or truncate the existing table before inserting data into the table).

7. Metadata import:

The Connector supports metadata operations, such as the discovery of database objects and describing these objects.

b. WebSphere MQ Connector:

WebSphere MQ Connector provides access to message queues in the WebSphere MQ enterprise messaging system. It provides the following types of support:
WebSphere MQ versions 5.3 and 6.0, and WebSphere Message Broker 6.0 for the publish/subscribe mode of work.

1. MQ client and MQ server connection modes. The choice can be made dynamically through the special connection property in the Connector’s stage editor.

2. Filtering of messages based on various combinations of message header fields. Complex filtering conditions might be specified.

3. Synchronous (request/reply) messaging. This scenario is configured by defining both input link (for request messages) and output link (for reply messages) for the Connector’s stage.

4. Publish/Subscribe mode of work. Both the WebSphere MQ broker (with MQRFH command messages) and WebSphere Message Broker (with MQRFH2 command messages) are supported. The Connector stage can be configured to run as a publisher or as a subscriber. Dynamic registration and deregistration of publisher/subscriber is supported.

5. MQ dynamic queues, name lists, transmission queues and shared cluster queues for remote queue messaging.

6. Designated error queues and standard reject links.

c. Teradata Connector:

The Teradata Connector includes the following features similar to the ODBC Connector:

Source, target and lookup context

Reject links

Passing LOBs by reference

Arrays

SQL Builder

Pre/post run statements

Metadata import

The Teradata Connector uses CLIv2 API for immediate operations (SELECT, INSERT, UPDATE, DELETE) and Parallel Transporter Direct API (formerly TEL-API) for bulk load and bulk extract operations.Parallel bulk load is supported through LOAD, UPDATE, and STREAM operators in Parallel Transporter. This corresponds to the functionality provided by the FastLoad, MultiLoad, and TPump Teradata utilities, respectively. When the UPDATE operator is used it supports the option for deleting rows of data (MultiLoad delete task). Parallel bulk export is supported through the EXPORT operator in Parallel Transporter. This corresponds to the functionality provided by the FastExport Teradata utility. The Connector persists the bulk-load progress state and provides sophisticated support for restarting the failed bulk-load operations. The Connector uses a designated database table for synchronization of distributed Connector instances in the parallel bulk-load. A limited support for stored procedures and macros is also available.

a. Teradata Connector advantages:
The following list details the Teradata Connector advantages:
Parallel MultiLoad capability including MPP configurations
Parallel immediate lookups and writes
Array support for better performance of immediate writes
Reject link support for lookups and writes on DS Enterprise Edition
Reject link support for bulk loads
Cursor lookups (lookups that return more than one row)
Restart capability for parallel bulk loads
MultiLoad delete task support
Support for BLOB and CLOB data types
Reject link support for missing UPDATE or DELETE rows
Error message and row count feedback for immediate lookups/writes.

b. Parallel synchronization table:
The following list details the Parallel Synchronization Table properties:
Used for coordination of player processes in parallel mode
Now optional, connector runs sequentially if not specified
Can be used for logging of execution statistics
Connector stage can use its own sync table or share it
Primary key is SyncID, PartitionNo, StartTim
Each player updates its own row, no lock contention
Management properties for dropping, deleting rows.

c. Comparison with existing Teradata stages:
The following list details comparisons with Teradata stages:
Limited support for stored procedures and macros, but the Stored Procedure plug-in is still better suited for it.
No more utilities, named pipes, control scripts, or report files.
Interface with the Parallel Transporter is through a direct call-level API.
Error messages are reported in the DataStage Director log.
MultiLoad plug-in jobs that use advanced custom script features cannot be migrated to use the Teradata Connector.
Number of players is determined by the PX Engine config file.

Table shows a list of advantages and disadvantages of the Parallel Transporter operators.

Operator	Equivalent Utility	Advantages	Disadvantages
Export	FastExport	Fastest export method.	Uses utility slot, No single-AMP SELECTs.
Load	FastLoad	Fastest load method.	Uses utility slot, INSERT only, Locks table, No views, No secondary indexes.
Update	MultiLoad	INSERT, UPDATE, DELETE, Views, Non-unique secondary indexes.	Uses utility slot, Locks table, No unique secondary indexes, Table inaccessible on abort.
Stream	Tpump	INSERT, UPDATE, DELETE, Views, Secondary indexes, No utility slot, No table lock.	Slower than UPDATE operator.

d. DB2 Connector
The DB2 Connector includes the following features similar to the ODBC Connector:
Source
Target and lookup context
Reject links
Passing LOBs by reference
Arrays
SQL Builder
Pre/post run statements
Metadata import
Supports DB2 version V9.1

The Connector is based on the CLI client interface. It can connect to any database cataloged on the DB2 client. The DB2 client must be collocated with the Connector, but the actual database might be local or remote to the Connector.
Separate the sets of connection properties for the job setup phase (conductor) and execution phase (player nodes), so the same database might be cataloged differently on conductor and player nodes.

The Connector provides support for the following tasks:

1. Specifying DB2 instance dynamically (through connection properties), which overrides the default environment settings (DB2INSTANCE environment variable).

2. XML data type in DB2 V9.1.

3. DB2 DPF. A job with a DB2 Connector target stage might be configured to assign on execution player node with each DB2 partition, and to write data to the partitioned database in parallel, providing dramatic performance improvement over sending the data to the same partition node and forcing DB2 to redirect data to corresponding partitions.

4. DB2 bulk load functionality. The invocation of bulk load is done through the CLI interface. Parallel bulk load is also supported for DB2 with DPF.

New features
In terms of functionality, the DB2 Connector offers more capabilities than all of
the existing stages. Specifically it provides support for the following elements:

1. XML data type

2. LOBs, with data passed either inline, or by a reference mechanism. The latter allows for LOB data of any size to be moved from a source to a target.

3. Client-server access to DB2 server. This overcomes a limitation with the EE stage that can only be used in homogenous environments where DataStage and DB2 are on identical (or the same) servers.

4. Options to control bulk load than EE Operator.

5. Design time capabilities, such as metadata import to the common model, and enumeration of server-side information.

Using rejects with user-defined SQL
Using rejects with user-defined SQL (UserSQL) is described in this section.

The following list describes the use of UserSQL without the reject link:
All statements in the UserSQL property are either passed for all of the input records in the current batch (as specified by the Array size property) or none. In other words events in previous statements do not control the number of records passed to the statements that follow.
If FailOnError=Yes, the first statement that fails causes the job to fail and the current transaction, as depicted by the Record Count property, is rolled back. No more statements from the UserSQL property are executed after that. For example, if there are three statements in the property, and the second one fails, it means that the first one has already executed, and the third one is not executed. None of the statements have its work committed because of the error.
If FailOnError=No, all statements still get all the records but any statement errors are ignored and statements continue to be executed. For example, if there are three statements in the UserSQL property and the second one fails, all three are executed and any successful rows are committed. The failed rows are ignored.

The following list describes the use of UserSQL with the reject link:
All statements in the UserSQL property are either passed all of the input records in the current batch (as specified by the Array size property) or none. In other words, events in previous statements do not control the number of records passed to the statements that follow.
All the rows in each batch (as specified by the Array size property) are either successfully consumed by all statements in the UserSQL property, or are rejected as a whole. This is important to preserve the integrity of the records processed by multiple statements that are expected to be atomic and committed in the same transaction. In other words, the connector tries to eliminate the possibility of having each statement successfully consume a set of rows.
If any of the rows in any statement in the UserSQL property are not consumed successfully, the processing of the current batch is aborted and the whole batch of records is sent to the reject link. The statements that follow the failed statement are not executed with the current batch. The processing resumes with the next batch.
To preserve the consistency of records in each batch, the connector forces a commit after every successful batch and forces a rollback after every failed batch. This means the connector overrides the transactional behavior specified by the Record Count property.

e. Oracle Connector

The Oracle Connector includes these features similar to the ODBC Connector:

Source

Target and lookup context

Reject links

Passing LOBs by reference

Arrays

SQL Builder

Pre/post run statements

Metadata import

In addition, it supports bulk loads and Oracle partitioning.

The Connector works with Oracle versions 10g and 11g. It supports connecting to an Oracle database through Oracle Full Client or Oracle Instant Client.

The following list details improvements made to the Oracle Connector:

· Distributed transactions:

Support for guaranteed delivery of transactions arriving in form of MQ messages. In case of success, the messages are processed by the job and the data is written to the target Oracle database. In case of a failure the messages are rolled back to the queue.

To use the Distributed Transaction stage, you need MQ 6.0 and Oracle 10g R2.

· Built-in Oracle scalar data types are supported, including BLOB, CLOB,NCLOB, BFILE, LONG, and LONG RAW data types.

· XMLType columns and object tables are supported.

· PL/SQL anonymous blocks with bind parameters are supported.

Support for a rich set of options for configuring reject links. Reject links are also supported in bulk load mode

· Pre- and Post- SQL operations are supported:

SQL statement, multiple SQL statements or PL/SQL anonymous block might be specified.

The statements might be configured to run at the job or at the node level.

The statements might be specified in the stage UI or in external files.

· Rich metadata import functionality:

Table selection based on table type (table, view, IOT, materialized view, external table, synonym), the table owner or the table name pattern.

Supports importing PK, FK, and index information.

· Table action:

Performing Create, Replace, or Truncate table actions in the job is supported before writing data to the table.

Input link column definitions automatically used to define target table columns.

· Fast read, write, and bulk load operations:

Performance improvement over Oracle Enterprise stage.

Rich set of options for configuring parallel read and write operations.

Provides control over the Transparent Application Failover (TAF) mechanism in environments such as Oracle RAC.

Includes storing TAF notifications in the job log to inform the user about the failover progress.

Oracle or OS authentication is supported for the connector at job runtime.

f. Essbase connector

The Essbase connector supports the extraction, delta extraction, and load of data to and from Essbase databases. It performs hierarchical to relational mapping of cube data. The connector supports parallel read and writes.

It is implemented in C++ using a 3rd party interface library from a partner, and uses DMDI to allow selection from cube data.

DataWareHousing Technology

About Me

Monday, September 1, 2014

DataStage to DataBase connector Stages

1 comment:

Adsense bar