Estimate the number of pipelines/schemas per integration use case:
-
Continuous Replication: Continuous Replication refers to
the real-time or near-real-time synchronization of data changes (inserts,
updates, deletes) from a source system to a target. It is commonly used in
scenarios where up-to-date data is critical, such as replicating from
DB2, SQL Server, or
Oracle to platforms like BigQuery,
Snowflake, or Kafka. This use case
typically involves multiple pods, such as
connect-cdc,connect-hub, andcloud-applier, which remain active continuously to monitor and apply data changes. Because of this sustained activity, the resource requirements for each pipeline are modest but must remain consistent to ensure reliable replication with low latency. -
Mainframe Replication: Mainframe Replication involves
transferring data from legacy mainframe systems (e.g., IBM
DB2z) to modern platforms for analytics or integration. It
supports high-throughput replication of mainframe data structures and formats
into platforms like Kafka or cloud data warehouses. This use
case generally requires fewer pods, with the
sqdata-managementpod being the primary component. While fewer in number, these pods often require higher CPU and memory per pipeline due to the complexity and size of the data being handled. -
Fast Load: Fast Load is designed for high-speed, bulk
data loading operations, typically used during initial data migration or full
dataset refreshes. Unlike continuous replication, it does not track incremental
changes but instead focuses on loading large volumes of data quickly from
sources like DB2 into cloud targets such as
BigQuery. This use case heavily utilizes the
connect-cdcandconnect-hubpods, but with significantly increased CPU and memory requirements per pipeline to support large data throughput in a short time frame.