After you have defined the source from which you want to extract data, you must specify the target database, file system, data warehouse, or Kafka stream to which you want to transfer your data for further processing, archiving (copy pipelines), or replicating captured data changes (replication pipelines).
- Pipelines for replication projects are configured to copy and/or replicate data changes to a single Kafka or Snowflake target.
- You define Apache Kafka properties such as row properties, message format type (CSV or JSON), specify large object types (LOB), and whether to write to a new topic you create on the Kafka server or use an existing topic.
- A Kafka target is a producer, publishing messages to a Kafka topic. Topics are categories where the Kafka cluster stores streams of records.
- For Snowflake targets, you select a warehouse, database, and schema with the option to create a target table if it does not exist. You also specify the batch criteria for applying records to the target.
To add a target connection, follow these steps:
- On the main navigation menu, select .
- Select the project that you want to use to create your continuous replication pipeline.
- Click +Create Pipeline to open the Create Pipeline wizard.
- As a second step in connections, click Add Target. The New Replication Connection dialog opens.
- For replication connection, enter a unique name for the data connection. Note: If you enter a name that already exists in the repository, you cannot save the connection. The name cannot exceed 200 characters. For replication data connections, spaces in the name are not supported and hyphens ( - ) are the only special characters supported.
- For Description, enter a brief description of the data connection.
- For connection type, click the drop-down list and select the type of connection you are adding. If you select a DBMS, specify the associated data Access method.
- Specify the properties for your selected connection.
- Click Test to verify that your credentials are valid for the properties you entered. Note: If the Test button is unavailable, make sure you have entered values in all required fields.
- Click Save to add the new connection.
- Click Next.
Note: When using SAP as the source connection and Kafka as the target connection, consider the following:
-
Message format values:
- Avro: 'After Image Only'
- CSV: 'After Image Only'
- JSON: 'Before And After Images in Same Record', 'After Image Only'
- SAP Drivers Path: Ensure that SAP drivers are placed in the connect-cdc directory
Snowflake as target data connection
When selecting Snowflake as a target connection for continuous replication pipeline, the following fields need to be configured:
- Storage to use for staging: Storage locations such as cloud storage location is utilized for loading data from files into Snowflake tables.
Table 1. Configuration based on storage type Storage type Fields Amazon S3 Method: Specify the authentication method and possible values are key or session token. - When selecting Key as the authentication method, the following fields must be filled:
- AWS access Key: Specifies the access key required for authentication.
- AWS secret key: Specifies the secret key required for authentication.
- When selecting Session Token as the authentication method, the following field must be filled:
- Session token: Specifies the session token required for authentication.
- AWS access Key: Specifies the access key required for authentication.
- AWS secret key: Specifies the secret key required for authentication.
Note: Replication operations will be interrupted if the session token expires. The maximum validity period for the session token is 36 hours, although the default duration is often shorter. To prevent disruptions in replication processes, it is crucial to renew the session token before it expires.Microsoft Azure Storage Method: Specify the authentication method and possible values are Azure SAS token or Azure Shared Key. - When selecting Azure SAS token as the authentication method, the following field must be filled:
- SAS token: Specify the SAS (Shared Access Signature) token required for authentication.
- When selecting Azure Shared Key as the authentication method, the following fields must be filled:
- Account name: Specifies the account name associated with the Azure storage account.
- Access token: Specifies the token used to authenticate and authorize access to secured storage.
Note: Replication operations will be interrupted if the SAS token expires. The maximum validity period for the SAS token is 7 days, although the default duration is often shorter. To prevent disruptions in replication processes, it is crucial to renew the session token before it expires.Google Cloud Storage Service account JSON: Specifies the Google service account value, which is the unique identifier used to authenticate and authorize access to Google Cloud Platform (GCP) services and resources. - When selecting Key as the authentication method, the following fields must be filled:
- Directory: Refers to the directory in the cloud storage where the data is temporarly staged.
- Storage integration: Stores an identity and access
management (IAM) entity for external cloud storage and allows settings for
permitted or restricted locations, such as Amazon S3. Cloud provider
administrators assign permissions for these locations, enabling seamless
operations without requiring credentials for stage creation or data
loading/unloading tasks.Note: This field is mandatory for connections using Google Cloud Storage, Microsoft Azure Storage with shared key authentication, and Amazon S3. In fact, it is mandatory regardless of the storage mechanism being used.
- Staging schema: Specifies the schema in which the staging tables should be created. This is a mandatory field.
- Duration: Allows you to select the duration limit for batch apply.
- Specify batch apply duration: Specifies the duration after which the changes are applied to Snowflake. This setting determines how frequently changes are synchronized and applied to the Snowflake database. The default value is 60 seconds.
- Batch apply duration format: Allows you to select the time format. This field is enabled when the Duration field is set to specify.
- Records: Allows you to select the size limit for batch apply operations.
- Specify the number of records: Specifies the number of records after which the changes are applied to Snowflake.
- Place value: Allows you to select the place value.