Configure quality pipeline - Precisely Data Integrity Suite

Data Integrity Suite

Product
Spatial_Analytics
Data_Integration
Data_Enrichment
Data_Governance
Precisely_Data_Integrity_Suite
geo_addressing_1
Data_Observability
Data_Quality
dis_core_foundation
Services
Spatial Analytics
Data Integration
Data Enrichment
Data Governance
Geo Addressing
Data Observability
Data Quality
Core Foundation
ft:title
Data Integrity Suite
ft:locale
en-US
PublicationType
pt_product_guide
copyrightfirst
2000
copyrightlast
2026

Go to Quality > Pipeline tab on the main navigation menu to view existing pipelines in table format. Columns on the table show for each pipeline, the name of the pipeline along with numbers of records and columns in the sample data used to construct the pipeline, the number of steps in each pipeline, the last user to modify the pipeline, and the date and time for the last modification.

When you Edit a pipeline, the pipeline editor initially displays two panels. The upper panel shows steps in the pipeline. You can add, remove, or edit steps in this panel. The lower panel displays sample data in table format. Each column represents a data field while each row represents a sample record. The top row shows name, semantic type, and field type for each data field.

When you select a step in the upper panel, it highlight columns in the data affected by a step. Highlighted columns show data as it appears after a step is complete. While a step is selected, you can select the Transformation Preview checkbox to display sample values in the highlighted columns as they appear before and after the transformation.

You can add a new step anywhere in a pipeline—at the beginning or end, before or after and existing step, or between steps. When you add or edit a step, the pipeline editor expands a settings panel from the right side of the window.

After you configure transformation options, use the Preview button before you save the settings to view the result of the transformation.

When you are satisfied with the results of a transformation, save and include it in the pipeline. Semantic type detection is applied to new columns as they are added by a step. The same semantic type as the original column is normally applied to a copied column. If the analysis detects the semantic type for a new column, it is displayed beneath the column name.

Create quality pipeline

Use this procedure to create a pipeline from the Pipelines list page.

To create a Data Quality pipeline.

  1. On the main navigation menu, click Quality > Pipelines.
  2. Click + Create Pipeline.
  3. Select the dataset that you want to use to create pipeline.
  4. Select Generate Sample or you can also upload a sample data.
    • The sample will sequentially retrieve the number of records specified here, starting with the first record in the dataset. You can specify between 1 and 2000 rows. The default value is 100. You may choose to increase this value to capture additional variability from the source data.
    • At the Sample Preview and Pipeline Grid section, you will see 500 records if the sample was generated with 500 or more sample count.
  5. Select the fields that you want to include in your sample and click Generate Sample.
  6. Click Create Pipeline.
A data quality pipeline is created and shown with your selected dataset fields. On this page you can view sample data as you add transformation steps to the pipeline. The pipeline page consists of three panes. The upper pane shows transformation steps in the pipeline. The table under that shows the state of the data before or after a transformation step. A third pane expands from the side when you add or edit a transformation step. It exposes options that you can edit for a selected transform.

You can edit the pipeline name by clicking the default name. Add and configure transformation steps as needed.

Add input

After the pipeline is established, you can enhance it by incorporating more fields. To do this, select the Add Input option.

Warning:
  • You can only choose the dataset that belongs to the same Agent as the primary input.
  • If an incompatible input is added to the pipeline, you may not be able to configure it in the Run configuration, and an error will be thrown stating that incompatible inputs have been added to the pipeline. This will also be reflected in the data validation errors.
The Add Input interface provides a comprehensive view and filtering options for datasets, including a selection of curated datasets from various sources. This feature enables you to efficiently filter datasets based on specific properties, facilitating quick access to the datasets required for analysis and reporting. For more information, please refer to View and filter datasets.

Duplicate quality pipeline

You can duplicate an existing pipeline to use it for a new pipeline. A duplicate pipeline initially consists of the same dataset and transformation steps as the original pipeline. Configurations are not copied.

  1. On the main navigation menu, click Quality > Pipelines.
  2. In the Name column, find the pipeline you want to duplicate. In the Search keyword box, you can enter a keyword to filter pipelines shown in the table.
  3. In the Name column, click the ellipsis.
  4. Click the Duplicate Pipeline command. This opens the Duplicate Pipeline dialog box.
  5. In the New pipeline name box, enter a name for the duplicated pipeline.
  6. Click the Duplicate button.
After you complete this procedure, the duplicated pipeline appears on the Pipelines page.
You can now edit settings for the duplicated pipeline.

Edit quality pipeline

Edit a pipeline to view, add, or edit transforms that it applies to uploaded data.

  1. On the main navigation menu, click Quality > Pipelines.
  2. In the Name column, find the pipeline that you want to edit. In the Search box, you can match all or part of a pipeline title to filter pipelines shown in the table.
  3. In the Name column, click the ellipsis, then click Edit.
Completing this procedure opens the pipeline page. On this page you can view sample data as you add transformation steps to the pipeline. The pipeline page consists of three panes. The upper pane shows transformation steps in the pipeline. The table under that shows the state of the data before or after a transformation step. A third pane expands from the side when you add or edit a transformation step. It exposes options that you can edit for a selected transform.
You can now view sample data from the associated dataset and add transformation steps to the pipeline.

Rename quality pipeline

You can rename a pipeline on the Pipelines page.

  1. On the main navigation menu, click Quality > Pipelines.
  2. In the Name column, find the pipeline that contains the pipeline you want to edit.
  3. In the Search keyword box, you can enter a keyword to filter pipelines shown in the table.
  4. In the Name column, click the ellipsis, then click Rename.
  5. In the edit box that appears in the Name column, type a new name, then press Enter. A name must start with a letter. It can contain letters, numbers, dashes, and underscores.

Delete quality pipeline

You can delete a pipeline from the Pipelines page.

  1. On the main navigation menu, click Quality > Pipelines.
  2. Find the pipeline that you want to delete in the pipelines table.
  3. In the Search keyword box, you can enter a keyword to filter pipelines shown in the table.
  4. In the Name column, click the ellipsis , then click Delete.
  5. Click Yes to confirm the deletion.

Filter rows that match data in pipeline

While editing a pipeline, you can filter out rows in the inspection table that do not contain specified data.

  1. Above the sample data table, in the Column/Data box, choose Data.
  2. In the adjacent Search Field Name box, enter a string that matches data you are looking for in the table.
    As you type, rows are hidden that do not contain matching data.
When you finish typing the search string, only those rows with data that matches the search string are visible in the table.

Quality pipeline settings

Tabs on the Data Quality Pipeline Settings dialog box enable you to view and edit entities and run configurations for a pipeline. This page is displayed when you click the Settings button on the Pipeline page.
The Dataset entities pane lists entities that are defined in the pipeline. You can click an entity in this pane to view and edit settings.
  • Entity type: Shows the type of entity.
  • Entity name: Specifies the name for the entity that shows above columns on the pipeline page. You can edit the name shown in this field.
  • Map fields for this entity: You can change fields mapped to the entity field types.
The Select a Run Configuration pane shows run configurations that have been defined for this pipeline. Depending on permissions assigned to your role, you can click a run configuration in this pane to view or edit settings for the run configuration.
  • Name: Specifies the name for the run configuration.
  • Connection: Specifies the connection to the source data.
  • Source dataset: This specifies the dataset on which to run the pipeline. By default, the dataset used to build the pipeline shows here. Click the Browse button to choose a dataset in a connected database. Input fields for the pipeline must match input fields in the source dataset schema.
  • Target options:
    • Append to target dataset: Output is appended to columns in the target dataset. You can use this option when the schema of the pipeline output matches the target dataset schema. The pipeline engine verifies that the pipeline output schema and the target dataset schema match before it commences output to the target dataset.
    • Overwrite to target dataset: Output overwrites the target dataset. Choose one of two options from the drop-down list.
      • Truncate Dataset: Empties the dataset, then writes data to the same dataset. This choice deletes all data but preserves the dataset definition. You can use this option when the schema of the pipeline output matches the target dataset schema. The pipeline engine verifies that the pipeline output schema and the target dataset schema match before it commences output to the target dataset.
      • Drop Dataset: Drop the target dataset, then create and write data to a new dataset. This choice recreates the dataset definition on the host server. The pipeline output does not have to match the target dataset schema and no verification is performed by the pipeline engine.
  • Target dataset: This specifies the dataset in which to store the output results from the pipeline. Click the Browse button to choose a dataset in a connected database.
  • Pipeline engine: Specifies the pipeline engine to run pipeline jobs. Select one of the pipeline engines from the list or click that Add button to create a new pipeline engine.