Precisely Agent

Data Integrity Suite

Product
Spatial_Analytics
Data_Integration
Data_Enrichment
Data_Governance
Precisely_Data_Integrity_Suite
geo_addressing_1
Data_Observability
Data_Quality
dis_core_foundation
Services
Spatial Analytics
Data Integration
Data Enrichment
Data Governance
Geo Addressing
Data Observability
Data Quality
Core Foundation
ft:title
Data Integrity Suite
ft:locale
en-US
PublicationType
pt_product_guide
copyrightfirst
2000
copyrightlast
2025

The Agent type pipeline engine enables you to execute Data Quality pipelines on-premises. This is particularly useful when you work with local databases that cannot access other cloud environments. Additionally, it gives you the ability to run pipelines within an JDBC Agent-enabled private environment.

Note: To run a data quality pipeline on-premises using an Oracle connection, please contact Precisely to set up your Agent on your Virtual Machine and to download the new version of the Oracle driver.
Tip: To run data quality pipelines on-premises, ensure that you have set up your Agent and configured a Pipeline Engine using that Agent.
  • Pipeline engine name: Enter a meaningful name for the pipeline engine.
  • Type: Specifies Precisely Agent as the datasource type for this pipeline engine.
  • Agent: Select the agent from the drop-down. Only registered agents are listed irrespective of their status. Agents which are created but not registered will not appear in the list.
  • Driver memory: Specifies the amount of memory that will be used by the driver. The default value is 1 GB. The minimum value must be an integer and greater or equal to 1.
  • Number of executors: Specifies the number of executors that will be used by the pipeline engine. Each executor represents a separate unit of work that can run in parallel. The more executors you have, the more concurrent tasks can be executed, which can improve performance. The default value is 1. The minimum value must be an integer and greater or equal to 1.
  • Executor memory: Specifies the amount of memory that will be allocated to each worker. This memory is used to store data and intermediate results during pipeline execution. The default value is 3 GB. The minimum value must be an integer and greater or equal to 1. The value must be greater or equal to 128 MB but less than or equal to 20000 MB. For Gigabytes, the value must be greater or equal to 1 GB but less than or equal to 20 GB.
  • Executor cores: Specifies the number of CPU cores that will be allocated to each worker. This determines the number of concurrent tasks that can be executed by each worker. The default value is 3. The minimum value must be an integer and greater or equal to 1.

Suggested spark properties to reduce disk usage

Consider implementing the following spark properties to help minimize disk requirements during on-premise agent profile executions:
  • spark.shuffle.compress: Set to true
  • spark.shuffle.spill.compress: Set to true
  • spark.io.compression.codec: Use zstd
  • spark.io.compression.zstd.level: Set to 3

Enabling these properties ensures that data written to disk during spark jobs is compressed, effectively reducing overall disk usage.

Advanced settings

  • Spark properties: This option allows you to specify Spark properties to determine compute, memory, and disk resources to allocate to Agent batch workloads.
  • Click Add Property to add a key-value pair.
    • Key: Specifies the Spark property, such as spark.driver.cores.
    • Value: Specifies a valid value for the property, such as 8. For more information, see Spark Properties in the Apache Spark documentation.