Set up pipeline engine on Databricks

Data Integrity Suite

Product
Spatial_Analytics
Data_Integration
Data_Enrichment
Data_Governance
Precisely_Data_Integrity_Suite
geo_addressing_1
Data_Observability
Data_Quality
dis_core_foundation
Services
Spatial Analytics
Data Integration
Data Enrichment
Data Governance
Geo Addressing
Data Observability
Data Quality
Core Foundation
ft:title
Data Integrity Suite
ft:locale
en-US
PublicationType
pt_product_guide
copyrightfirst
2000
copyrightlast
2026

Follow these instructions to create or edit a pipeline engine to run jobs on a Databricks instance pool.

To run Data Quality pipelines on a Databricks instance pool, you must first create the instance pool in your Databricks environment. Refer to the related topics links at the end of the procedure for instance pool information in the Databricks documentation.

Note:
  • Currently, only Databricks Runtime (DBR) versions 13, 14, and 15 are supported.
  • Make sure that when you create a Databricks instance pool to run Data Quality pipeline jobs, you select a preloaded Databricks runtime version. For more information, see Manage Databricks clusters.
  1. On the main navigation menu, click Configuration.
  2. Click the Pipeline Engines tab.
  3. Open the pipeline engine settings panel to create a new pipeline engine or edit an existing pipeline engine.
    Create a new pipeline engine: Click the Create Pipeline Engine button to open the Create New Pipeline Engine settings panel.
    Edit an existing pipeline engine: Find the pipeline engine you want to edit, and on the shortcut menu, click Edit to open the Edit Pipeline Engine settings panel.
  4. In the Name box, enter a meaningful name for the pipeline engine.
  5. In the Connection box, choose the Databricks connection for the job processing instance that you want to use.
    This should expose the Job processing instance pool option if it is not already visible.
  6. In the Job processing instance pool box, choose the Databricks instance pool on which to run jobs.
  7. In the Fileshare path box, enter the Databricks File System (DBFS) path to a folder for storing Data Quality artifacts such as logs.
  8. If you want to use object storage (e.g., Amazon S3) for the processing/staging location, DBFS mounts are supported, e.g., dbfs:/mnt/<mount-name>. For more information about setting up DBFS mounts in Databricks, you can refer to Mounting cloud object storage on Databricks.
After you complete this procedure, you can create pipeline runtime configurations that use the pipeline engine to run Data Quality jobs on the Databricks instance pool.