As a general guideline, new projects
should give preference to Connector stages, and take advantage of existing
Enterprise equivalents in specific cases where these have an edge in terms of
performance.
a. ODBC
Connector:
The ODBC Connector has the
following features:
1.
Source,
target and lookup context:
The ODBC Connector can be
used similar to typical ODBC stage for extracting (in the context of source), for
loading (in the context of target) and for lookup (in the context of lookup). The lookup context is similar to the source
context, with the difference that the SELECT SQL statements are parameterized.
The parameter values are provided dynamically on the input link by the other
stages in the job.
2.
Reject
links:
The
ODBC Connector supports a special type of link called reject link. The
Connector can be configured to direct the data that it cannot process to the
reject link, from which it can be sent to any stage in the job.
3.
Passing
LOBs by reference:
The
Connector allows the option of passing LOBs by reference, rather than by
extracting the data and passing it inline into the job flow. When configured to
pass LOB values by reference, the ODBC Connector assembles a special block of
data, called a locator or reference, that it passes into the job dataflow.
Other Connectors are placed at the end of the job dataflow. When the LOB
locator arrives at the other Connector stage, the Connector framework initiates
the retrieval of the actual ODBC data represented by the reference and provides
the data to the target Connector so that it can be loaded into the represented
resource. This way it is possible to move LOB data from one data resource to
another where the size of the data is measured in megabytes and gigabytes,
without having to move the actual data through the job. The drawback is that
the data passed this way cannot be altered by the intermediate stages in the
job.
4.
Arrays:
The
Connector supports array insert operations in target context. The Connector buffers
the specified number of input rows before inserting them to the database in a
single operation. This provides for better performance when inserting large
numbers of rows.
5.
SQL
Builder
The
Connector user uses the SQL Builder tool to design the SQL statements.
6.
Pre/post
run statements:
The
Connector provides a mechanism for executing SQL statements that typically are
used to initialize the database objects (for example to create a new table or
truncate the existing table before inserting data into the table).
7.
Metadata
import:
The
Connector supports metadata operations, such as the discovery of database
objects and describing these objects.
b. WebSphere
MQ Connector:
WebSphere MQ
Connector provides access to message queues in the WebSphere MQ enterprise
messaging system. It provides the following types of support:
WebSphere MQ versions 5.3 and 6.0, and WebSphere Message Broker 6.0 for the publish/subscribe mode of work.
WebSphere MQ versions 5.3 and 6.0, and WebSphere Message Broker 6.0 for the publish/subscribe mode of work.
1. MQ client and MQ server connection modes. The choice can be made
dynamically through the special connection property in the Connector’s stage
editor.
2. Filtering of messages based on various combinations of message header
fields. Complex filtering conditions might be specified.
3. Synchronous (request/reply) messaging. This scenario is configured by
defining both input link (for request messages) and output link (for reply
messages) for the Connector’s stage.
4. Publish/Subscribe mode of work. Both the WebSphere MQ broker (with MQRFH
command messages) and WebSphere Message Broker (with MQRFH2 command messages)
are supported. The Connector stage can be configured to run as a publisher or
as a subscriber. Dynamic registration and deregistration of
publisher/subscriber is supported.
5. MQ dynamic queues, name lists, transmission queues and shared cluster
queues for remote queue messaging.
6. Designated error queues and standard reject links.
c. Teradata
Connector:
The Teradata Connector includes
the following features similar to the ODBC Connector:
Source, target and lookup
context
Reject links
Passing LOBs by reference
Arrays
SQL Builder
Pre/post run statements
Metadata import
The
Teradata Connector uses CLIv2 API for immediate operations (SELECT, INSERT,
UPDATE, DELETE) and Parallel Transporter Direct API (formerly TEL-API) for bulk
load and bulk extract operations.Parallel bulk load is supported through LOAD,
UPDATE, and STREAM operators in Parallel Transporter. This corresponds to the
functionality provided by the FastLoad, MultiLoad, and TPump Teradata
utilities, respectively. When the UPDATE operator is used it supports the
option for deleting rows of data (MultiLoad delete task). Parallel bulk export
is supported through the EXPORT operator in Parallel Transporter. This
corresponds to the functionality provided by the FastExport Teradata utility. The
Connector persists the bulk-load progress state and provides sophisticated
support for restarting the failed bulk-load operations. The Connector uses a
designated database table for synchronization of distributed Connector
instances in the parallel bulk-load. A limited support for stored procedures
and macros is also available.
a.
Teradata
Connector advantages:
The following list details the Teradata Connector advantages:
Parallel MultiLoad capability including MPP configurations
Parallel immediate lookups and writes
Array support for better performance of immediate writes
Reject link support for lookups and writes on DS Enterprise Edition
Reject link support for bulk loads
Cursor lookups (lookups that return more than one row)
Restart capability for parallel bulk loads
MultiLoad delete task support
Support for BLOB and CLOB data types
Reject link support for missing UPDATE or DELETE rows
Error message and row count feedback for immediate lookups/writes.
The following list details the Teradata Connector advantages:
Parallel MultiLoad capability including MPP configurations
Parallel immediate lookups and writes
Array support for better performance of immediate writes
Reject link support for lookups and writes on DS Enterprise Edition
Reject link support for bulk loads
Cursor lookups (lookups that return more than one row)
Restart capability for parallel bulk loads
MultiLoad delete task support
Support for BLOB and CLOB data types
Reject link support for missing UPDATE or DELETE rows
Error message and row count feedback for immediate lookups/writes.
b.
Parallel
synchronization table:
The following list details the Parallel Synchronization Table properties:
Used for coordination of player processes in parallel mode
Now optional, connector runs sequentially if not specified
Can be used for logging of execution statistics
Connector stage can use its own sync table or share it
Primary key is SyncID, PartitionNo, StartTim
Each player updates its own row, no lock contention
Management properties for dropping, deleting rows.
The following list details the Parallel Synchronization Table properties:
Used for coordination of player processes in parallel mode
Now optional, connector runs sequentially if not specified
Can be used for logging of execution statistics
Connector stage can use its own sync table or share it
Primary key is SyncID, PartitionNo, StartTim
Each player updates its own row, no lock contention
Management properties for dropping, deleting rows.
c.
Comparison with existing Teradata stages:
The following list details comparisons with Teradata stages:
Limited support for stored procedures and macros, but the Stored Procedure plug-in is still better suited for it.
No more utilities, named pipes, control scripts, or report files.
Interface with the Parallel Transporter is through a direct call-level API.
Error messages are reported in the DataStage Director log.
MultiLoad plug-in jobs that use advanced custom script features cannot be migrated to use the Teradata Connector.
Number of players is determined by the PX Engine config file.
The following list details comparisons with Teradata stages:
Limited support for stored procedures and macros, but the Stored Procedure plug-in is still better suited for it.
No more utilities, named pipes, control scripts, or report files.
Interface with the Parallel Transporter is through a direct call-level API.
Error messages are reported in the DataStage Director log.
MultiLoad plug-in jobs that use advanced custom script features cannot be migrated to use the Teradata Connector.
Number of players is determined by the PX Engine config file.
Table shows a list of advantages and
disadvantages of the Parallel Transporter operators.
Operator
|
Equivalent
Utility |
Advantages
|
Disadvantages
|
Export
|
FastExport
|
Fastest export method.
|
Uses utility slot, No
single-AMP SELECTs.
|
Load
|
FastLoad
|
Fastest load method.
|
Uses utility slot, INSERT
only, Locks table, No views, No secondary indexes.
|
Update
|
MultiLoad
|
INSERT, UPDATE, DELETE, Views, Non-unique
secondary indexes.
|
Uses utility slot, Locks
table, No unique secondary indexes, Table inaccessible on abort.
|
Stream
|
Tpump
|
INSERT, UPDATE, DELETE,
Views, Secondary indexes, No utility slot, No table lock.
|
Slower than UPDATE
operator.
|
d.
DB2 Connector
The DB2 Connector includes the following features similar to the ODBC Connector:
Source
Target and lookup context
Reject links
Passing LOBs by reference
Arrays
SQL Builder
Pre/post run statements
Metadata import
Supports DB2 version V9.1
The DB2 Connector includes the following features similar to the ODBC Connector:
Source
Target and lookup context
Reject links
Passing LOBs by reference
Arrays
SQL Builder
Pre/post run statements
Metadata import
Supports DB2 version V9.1
The Connector is based on the CLI client
interface. It can connect to any database cataloged on the DB2 client. The DB2
client must be collocated with the Connector, but the actual database
might be local or remote to the Connector.
Separate the sets of connection properties for the job setup phase (conductor) and execution phase (player nodes), so the same database might be cataloged differently on conductor and player nodes.
Separate the sets of connection properties for the job setup phase (conductor) and execution phase (player nodes), so the same database might be cataloged differently on conductor and player nodes.
The Connector
provides support for the following tasks:
1.
Specifying DB2 instance dynamically (through
connection properties), which overrides the default environment settings (DB2INSTANCE
environment variable).
2.
XML data type in DB2 V9.1.
3.
DB2 DPF. A job with a DB2 Connector target
stage might be configured to assign on execution player node with each DB2
partition, and to write data to the partitioned database in parallel, providing
dramatic performance improvement over sending the data to the same partition
node and forcing DB2 to redirect data to corresponding partitions.
4.
DB2 bulk load functionality. The invocation
of bulk load is done through the CLI interface. Parallel bulk load is also
supported for DB2 with DPF.
New
features
In terms of functionality, the DB2 Connector offers more capabilities than all of
the existing stages. Specifically it provides support for the following elements:
In terms of functionality, the DB2 Connector offers more capabilities than all of
the existing stages. Specifically it provides support for the following elements:
1.
XML data type
2.
LOBs, with data passed either inline, or by a
reference mechanism. The latter allows for LOB data of any size to be moved
from a source to a target.
3.
Client-server access to DB2 server. This
overcomes a limitation with the EE stage that can only be used in homogenous
environments where DataStage and DB2 are on identical (or the same) servers.
4.
Options to control bulk load than EE
Operator.
5.
Design time capabilities, such as metadata
import to the common model, and enumeration of server-side information.
Using
rejects with user-defined SQL
Using rejects with user-defined SQL (UserSQL) is described in this section.
Using rejects with user-defined SQL (UserSQL) is described in this section.
The following list describes the use of UserSQL
without the reject link:
All statements in the UserSQL property are either passed for all of the input records in the current batch (as specified by the Array size property) or none. In other words events in previous statements do not control the number of records passed to the statements that follow.
If FailOnError=Yes, the first statement that fails causes the job to fail and the current transaction, as depicted by the Record Count property, is rolled back. No more statements from the UserSQL property are executed after that. For example, if there are three statements in the property, and the second one fails, it means that the first one has already executed, and the third one is not executed. None of the statements have its work committed because of the error.
If FailOnError=No, all statements still get all the records but any statement errors are ignored and statements continue to be executed. For example, if there are three statements in the UserSQL property and the second one fails, all three are executed and any successful rows are committed. The failed rows are ignored.
All statements in the UserSQL property are either passed for all of the input records in the current batch (as specified by the Array size property) or none. In other words events in previous statements do not control the number of records passed to the statements that follow.
If FailOnError=Yes, the first statement that fails causes the job to fail and the current transaction, as depicted by the Record Count property, is rolled back. No more statements from the UserSQL property are executed after that. For example, if there are three statements in the property, and the second one fails, it means that the first one has already executed, and the third one is not executed. None of the statements have its work committed because of the error.
If FailOnError=No, all statements still get all the records but any statement errors are ignored and statements continue to be executed. For example, if there are three statements in the UserSQL property and the second one fails, all three are executed and any successful rows are committed. The failed rows are ignored.
The following list describes the use of
UserSQL with the reject link:
All statements in the UserSQL property are either passed all of the input records in the current batch (as specified by the Array size property) or none. In other words, events in previous statements do not control the number of records passed to the statements that follow.
All the rows in each batch (as specified by the Array size property) are either successfully consumed by all statements in the UserSQL property, or are rejected as a whole. This is important to preserve the integrity of the records processed by multiple statements that are expected to be atomic and committed in the same transaction. In other words, the connector tries to eliminate the possibility of having each statement successfully consume a set of rows.
If any of the rows in any statement in the UserSQL property are not consumed successfully, the processing of the current batch is aborted and the whole batch of records is sent to the reject link. The statements that follow the failed statement are not executed with the current batch. The processing resumes with the next batch.
To preserve the consistency of records in each batch, the connector forces a commit after every successful batch and forces a rollback after every failed batch. This means the connector overrides the transactional behavior specified by the Record Count property.
All statements in the UserSQL property are either passed all of the input records in the current batch (as specified by the Array size property) or none. In other words, events in previous statements do not control the number of records passed to the statements that follow.
All the rows in each batch (as specified by the Array size property) are either successfully consumed by all statements in the UserSQL property, or are rejected as a whole. This is important to preserve the integrity of the records processed by multiple statements that are expected to be atomic and committed in the same transaction. In other words, the connector tries to eliminate the possibility of having each statement successfully consume a set of rows.
If any of the rows in any statement in the UserSQL property are not consumed successfully, the processing of the current batch is aborted and the whole batch of records is sent to the reject link. The statements that follow the failed statement are not executed with the current batch. The processing resumes with the next batch.
To preserve the consistency of records in each batch, the connector forces a commit after every successful batch and forces a rollback after every failed batch. This means the connector overrides the transactional behavior specified by the Record Count property.
e. Oracle Connector
The Oracle Connector includes these
features similar to the ODBC Connector:
Source
Target and lookup context
Reject links
Passing LOBs by reference
Arrays
SQL Builder
Pre/post run statements
Metadata import
In addition, it supports bulk loads
and Oracle partitioning.
The Connector works with Oracle
versions 10g and 11g. It supports connecting to an Oracle database through
Oracle Full Client or Oracle Instant Client.
The following list details
improvements made to the Oracle Connector:
·
Distributed
transactions:
Support for guaranteed delivery of
transactions arriving in form of MQ messages. In case of success, the messages
are processed by the job and the data is written to the target Oracle database.
In case of a failure the messages are rolled back to the queue.
To use the Distributed Transaction
stage, you need MQ 6.0 and Oracle 10g R2.
·
Built-in
Oracle scalar data types are supported, including BLOB, CLOB,NCLOB, BFILE,
LONG, and LONG RAW data types.
·
XMLType
columns and object tables are supported.
·
PL/SQL
anonymous blocks with bind parameters are supported.
Support for a rich set of options for
configuring reject links. Reject links are also supported in bulk load mode
·
Pre-
and Post- SQL operations are supported:
SQL statement, multiple SQL statements
or PL/SQL anonymous block might be specified.
The statements might be configured to
run at the job or at the node level.
The statements might be specified in
the stage UI or in external files.
·
Rich
metadata import functionality:
Table selection based on table type
(table, view, IOT, materialized view, external table, synonym), the table owner
or the table name pattern.
Supports importing PK, FK, and index
information.
·
Table
action:
Performing Create, Replace, or
Truncate table actions in the job is supported before writing data to the
table.
Input link column definitions
automatically used to define target table columns.
·
Fast
read, write, and bulk load operations:
Performance improvement over Oracle
Enterprise stage.
Rich set of options for configuring
parallel read and write operations.
Provides control over the Transparent
Application Failover (TAF) mechanism in environments such as Oracle RAC.
Includes storing TAF notifications in
the job log to inform the user about the failover progress.
Oracle or OS authentication is
supported for the connector at job runtime.
f. Essbase connector
The Essbase connector supports the
extraction, delta extraction, and load of data to and from Essbase databases.
It performs hierarchical to relational mapping of cube data. The connector
supports parallel read and writes.
It is implemented in C++ using a 3rd party
interface library from a partner, and uses DMDI to allow selection from cube data.
No comments:
Post a Comment