This connection focuses on applying changes from Kafka topics, VSAM, and IMS to the target Db2 databases, including both Db2 for z/OS and Db2 for IBM i. The runtime engine is responsible for consuming messages from these sources and applying the changes to the respective Db2 database. It handles the transformation of data from the source format (Kafka, VSAM, or IMS) to the schema of the target Db2, ensuring that updates and inserts are accurately executed.
To create a replication pipeline for the following combinations:
- Kafka to Db2 for z/OS
- VSAM, IMS and Kafka to Db2 for IBM i
Below instructions will guide you through configuring connections, specifying source and target data, defining replication fields, and reviewing the summary.
General
In the General section of the Create Mainframe Pipeline wizard, you are required to provide essential details for setting up your mainframe replication pipeline:
- Pipeline Name: Specify a unique name for the mainframe replication pipeline which helps you easily recognize and manage it.
- Description: Specifies the description of the mainframe replication pipeline
Connections
In the Connections section of the Create Mainframe Pipeline wizard, you will configure and review details related to the source and target data connections for the mainframe replication pipeline.
- Once you select the source connection, you can view the details such as;
- Catalog Summary: Specifies the status of discovery or any action to be taken against the dataset.
- Cataloged: Specifies the number of discovered datasets for the source data connection.
- Last Completed: Specifies the date and time when the last discovery was complete.
Note: When no source connections are configured you are required to configure them in the configuration page. - Once you select the target connection, you can view the details such as;
- Catalog Summary: Specifies the status of discovery or any action to be taken against the dataset.
- Cataloged: Specifies the number of discovered datasets for the target data connection.
- Last Completed: Specifies the date and time when the last discovery was complete.
Note: When no target connections are configured you are required to configure them in the configuration page.
Once the connections are selected, click Next to proceed.
Source data
The Source Data page provides comprehensive details about the mainframe replication pipeline, helping you manage and interact with the data sources effectively. Below is the list of fields available on this page:
- Pipeline: Specifies the name of the mainframe pipeline.
- Source Connection: Specifies the connection configuration used to access the data source.
- Type: Specifies the type of source datasource
- Catalog summary: Specifies the status of the data catalog for the source connection. It summarizes whether the cataloging process is complete, in progress, or has encountered issues.
- Cataloged: Shows the total count of items that have been cataloged within the data source. It may also indicate any actions needed to complete the cataloging process or address issues with the dataset.
- Last Completed: Records the date and time when the most recent data discovery process was completed. This timestamp helps track the freshness of the data and when it was last updated.
- Catalog: A clickable option that initiates the cataloging process for datasets within the source connection. It allows users to start or re-start the process of cataloging data for easier management and access.
- Search: Provides a search functionality to filter the list of datasets based on names that start with specified characters.
- Topic: Lists the columns available in the dataset.
- Last updated: Displays the date and time when the discovery table for the source data was last updated.
Replication Fields
The data sources you select are designed to capture changes to your source data, such as inserts, updates, and deletes. For effective data replication, the replication mechanism must accurately identify and process the information in these changes to apply them to the target data. This involves selecting the appropriate producer for the source messages and ensuring that each field in the source data is correctly mapped to the corresponding replication fields.
Different mechanisms for capturing and replicating these changes include tools like Precisely Replication, Debezium, or custom solutions, depending on the data source and the replication requirements.
- Precisely Replication: A system that uses proprietary connectors to capture and replicate changes in real-time or batch processes.
- Debezium: An open-source platform that streams changes from database logs to Kafka topics. For additional details, refer to the relevant documentation.
- Custom change capture: Refers to tailored solutions that track changes using methods such as database triggers or custom logs, addressing specific needs or constraints.
Verifying the correct assignment of fields is essential for maintaining data consistency and integrity throughout the replication process. Accurate mapping of source message fields to their replication counterparts ensures that changes are applied correctly to the target system, preserving the integrity and synchronization of the data.
| Precisely Replication | Debezium | Custom |
|---|---|---|
|
Row operation: Specifies the type of operation performed on individual rows during data replication. |
Row operation: Specifies the type of operation performed on individual rows during data replication. | Row operation: Specifies the type of operation performed on individual rows during data replication. |
|
Row timestamp: Returns the date and time at which the iSeries update operation occurred, presented in GMT time and includes fractions of a second. It follows the format,
|
Row timestamp: Returns the date and time at which the iSeries update operation occurred, presented in GMT time and includes fractions of a second. It follows the format, YYYYMMDDHHMMSSffffff where:
|
Row timestamp: Returns the date and time at which the iSeries update operation occurred, presented in GMT time and includes fractions of a second. It follows the format, YYYYMMDDHHMMSSffffff where:
|
| Dataset name: Specifies the name of the dataset for accurate data synchronization between mainframe and target systems. | Table name: Specifies the name of table. | Dataset name: Specifies the name of the dataset for accurate data synchronization between mainframe and target systems. |
| DBMS type: Specifies the category of the Database Management System (DBMS) used in the replication pipeline, ensuring compatibility and synchronization between systems. | Schema name: Specifies the name of schema. | Transaction ID: Specifies a 16-digit, unique identifier of the transaction associated with the column being referenced in the source dataset. |
| Server name: Specifies the name of the server hosting the mainframe system. | Database name: Specifies the name of database. | Transaction row sequence: Specifies an integer value indicating the position of the row containing the source column in the chronological sequence of row manipulations within the transaction. |
| Transaction ID: Specifies a 16-digit, unique identifier of the transaction associated with the column being referenced in the source dataset. | DBMS type: Specifies the category of the Database Management System (DBMS) used in the replication pipeline, ensuring compatibility and synchronization between systems. | Transaction timestamp: Specifies the behavior of transaction timestamp within a mainframe pipeline. For log-based Change Selectors, it returns the transaction commit timestamp. Time is GMT (Greenwich Mean Time) and includes fractions of a second. For trigger-based Change Selector, it returns the last update time for replication. |
| Transaction row sequence: Specifies an integer value indicating the position of the row containing the source column in the chronological sequence of row manipulations within the transaction. | Transaction ID: Specifies a 16-digit, unique identifier of the transaction associated with the column being referenced in the source dataset. | Before image: Refers to the state of the data before any modifications or updates occur. It contains the original values of the data. |
| Transaction timestamp: Specifies the behavior of transaction timestamp within a mainframe pipeline. For log-based Change Selectors, it returns the transaction commit timestamp. Time is GMT (Greenwich Mean Time) and includes fractions of a second. For trigger-based Change Selector, it returns the last update time for replication. | Transaction row sequence: Specifies an integer value indicating the position of the row containing the source column in the chronological sequence of row manipulations within the transaction. | After image: Refers to the state of the data after a transaction or update operation is completed. It contains the updated or modified values of the data. |
| Transaction username: Specifies the username associated with the transaction containing the recorded row operation. | Transaction timestamp: Specifies the behavior of transaction timestamp within a mainframe pipeline. For log-based Change Selectors, it returns the transaction commit timestamp. Time is GMT (Greenwich Mean Time) and includes fractions of a second. For trigger-based Change Selector, it returns the last update time for replication. | DBMS type: Specifies the category of the Database Management System (DBMS) used in the replication pipeline, ensuring compatibility and synchronization between systems. |
| Before image: Refers to the state of the data before any modifications or updates occur. It contains the original values of the data. | Transaction username: Specifies the username associated with the transaction containing the recorded row operation. | Server name: Specifies the name of the server hosting the mainframe system. |
| After image: Refers to the state of the data after a transaction or update operation is completed. It contains the updated or modified values of the data. | Before image: Refers to the state of the data before any modifications or updates occur. It contains the original values of the data. |
Transaction username:
Specifies the username associated with the transaction containing the recorded row operation. his is an optional field. |
|
Table name:
Specifies the name of table. his is an optional field. |
After image: Refers to the state of the data after a transaction or update operation is completed. It contains the updated or modified values of the data. |
Table name:
Specifies the name of table. This is an optional field. |
|
Schema name:
Specifies the name of schema. This is an optional field. |
Dataset name:
Specifies the name of the dataset for accurate data synchronization between mainframe and target systems. This is an optional field. |
Schema name:
Specifies the name of schema. This is an optional field. |
|
Database name:
Specifies the name of database. This is an optional field. |
Server name:
Specifies the name of the server hosting the mainframe system. This is an optional field. |
Database name:
Specifies the name of database. This is an optional field. |
Target Data
The Target Data page provides essential details about the data destination and management within the mainframe replication pipeline. Below is the list of fields available on this page:
- Search: Allows users to filter the list of schemas and datasets by entering characters or keywords. Only items that start with the specified characters will be displayed, making it easier to locate specific entries in a large list.
- Schema: Displays the different schema structures available within the dataconnection.
- Datasets Selected: Indicates the number of datasets that have been chosen for processing under each schema. This helps in tracking which datasets are selected for various operations.
-
Dataset: Lists all the datasets that are available through the current source connection. This field provides a view of all data containers that can be accessed.
- Last updated: Specifies the date and time when the discovery table for the target data was last updated.
Mapping
In the mapping process, the source dataset names specified in the source system messages must be accurately mapped to the target datasets you have selected. Initially, since the source dataset names are not known until replication starts, they are defaulted to match the target names. It is essential to verify the accuracy of these source dataset names to ensure that changes are correctly applied to the appropriate target datasets. Proper verification and mapping are important for maintaining data integrity and consistency throughout the replication process.
-
Change Mapping: Opens a dialog window for defining schema and dataset mappings, allowing adjustments to how data is mapped between source and target.
- x out of y rows selected: The x and y displays the count of rows currently selected for mapping changes..
- Specify a pattern: Allows you to create a schema by combining source field tokens with manual text inputs to define the data structure.
- Specify value: Appears when the dropdown field is set to Specify, enabling the you to define specific values for the mapping.
- Apply: Saves and applies the mapping changes made for the selected rows.
- Close: Closes the dialog window without saving any changes made during the session.
- Reset to Default: Reverts the selected columns or mappings back to their default settings.
- Source: Lists the available schemas and datasets from the source side of the replication.
- Target: Lists the available schemas and datasets on the target side of the replication.
Summary
The Summary section provides an overview of the key actions related to configuring and managing the mainframe replication pipeline:
- Stage Configuration Changes: Allows you to commit the configuration changes made to the mainframe replication pipeline. It saves the modifications, preparing them for review or further action without applying them immediately.
- Make Configurations Changes Active: After staging the changes, this option deploys and activates them, making the new configurations operational within the replication pipeline.
- Start Replication Pipeline: Initiates the replication pipeline using the newly applied configuration changes. It starts the data replication process as per the updated settings.
-
Finish: Clicking 'Finish' finalizes the configuration process. It saves all changes and executes the specified actions, including staging, activating configurations, and starting the pipeline.
In-Progress Feedback: Once Finish is clicked, the wizard provides real-time feedback on the progress of the actions.
Successful Completion: If all actions are successful, the wizard closes, and the updated pipeline configuration appears in the pipeline list.
Error Handling: If any issues occur during the process, the wizard remains open and displays an error message.