To manage quota constraints effectively, configure instance pools with a defined Max Capacity. When the same pipeline engine is used for multiple pipelines, jobs will be executed in batches. However, using different pipeline engines for multiple pipelines has no impact. This configuration ensures efficient resource utilization and controlled execution of jobs.
Suggested Implementation
-
Create an Instance Pool: Define the maximum capacity as the number of instances allocated for the workload.
-
Onboard Data Sources: Batch processing involves executing jobs in a sequence where some jobs run concurrently while others wait in the queue. As jobs finish, the next queued jobs start processing.
Example Configurations
| Databricks Core Configuration | Instance Pool Configuration | Pipeline Engine Configuration | Capacity Configuration | ||||
|---|---|---|---|---|---|---|---|
| Quota Limit (Cores) | Instance Pool Cores | Instance Pool Memory | Cluster Type | Min Nodes | Max Nodes | Concurrent Jobs | Max Capacity (Instances) |
| 100 | 4 | 32 GB | Single Node | 0 | 1 | 12 | 24 |
| 50 | 4 | 32 GB | Auto Scale | 2 | 4 | 2 | 10 |
Benefits
-
Efficient Resource Utilization: Prevents exceeding core usage limits by controlling job concurrency.
-
Streamlined Job Execution: Jobs are processed in batches, minimizing risk of quota violations.