About pipelines

Data Integrity Suite

Product
Spatial_Analytics
Data_Integration
Data_Enrichment
Data_Governance
Precisely_Data_Integrity_Suite
geo_addressing_1
Data_Observability
Data_Quality
dis_core_foundation
Services
Spatial Analytics
Data Integration
Data Enrichment
Data Governance
Geo Addressing
Data Observability
Data Quality
Core Foundation
ft:title
Data Integrity Suite
ft:locale
en-US
PublicationType
pt_product_guide
copyrightfirst
2000
copyrightlast
2025

A Data Quality pipeline automates the movement and transformation of data. A pipeline ingests data from a dataset and performs a series of steps that transform the data.

Steps in a pipeline may include tasks such as data standardization, de-duplication, cleaning, validation, and reformatting. Each step in a transformation uses the output from preceding steps. Strung together in a pipeline, the steps perform a sequence of operations that transform source data into a desired quality and format. The pipeline then outputs the clean data to a data sink such as the source dataset, a new dataset, or a file. Data Quality pipelines can ensure the accuracy, consistency, uniqueness, integrity, and validity of data before you upload it to its final destination.

Pipelines list page

The table on this page lists pipelines. On this page you can create a new pipeline or edit, delete, or rename any existing pipeline.

This page is displayed when you go to Quality > Pipelines tab on the main navigation menu.

  • Search: Type any part of a pipeline name to only show pipelines with matching names.
  • Create Pipeline: Click this button to create a new pipeline.
  • Name: Shows the name of a pipeline. Click the ellipsis to Edit, Delete, or Rename, or Duplicate a pipeline. You can also choose Run configurations to create, edit, or run a configuration.
  • Dataset: The dataset processed by the pipeline. You can click a dataset name to edit the dataset.
  • Status: Indicates whether a pipeline is error-free or is invalid to run.
  • Modified By: Displays the name of the user who last updated the pipeline.
  • Last Modified: Displays the latest date when the pipeline was updated.

Pipeline data sample profiling

The Data Quality pipeline page provides profiling information for sample data. The profiling feature characterizes data in each field and checks for anomalies that may require cleanup before production data is delivered by a pipeline.

The system checks individual fields to confirm that their contents agree with their base and semantic type. For example, if the semantic type is Telephone Number, then alpha entries represent a problem. Similarly, if the semantic type is Email Address, then the absence of a domain name (@gmail.com) or consecutive dots (John.Doe@gmail.com) may represent a problem.

Pipeline suggestions

Data Quality pipelines offer suggestions to add steps based on columns in the sample dataset.

As you create or edit a pipeline, you can expand the Suggestions panel to view recommendations for columns and entities in the sample dataset. Suggestions for columns are based on their semantic types. Suggestions for an entity (delimited by the entity bar above the column headings) are based on the semantic types of the columns in the entity. By default, the Suggestions panel displays up to 10 suggestions for a pipeline.

Examples:

Here are some of the suggestions you might encounter:

  • If the data includes an address entity, the system will suggest you add the Verify Address & Geocoding step.
  • If Full Name or Company Name are columns, then the system will suggest you add the Parse Name step.
  • If Email or Mobile Phone are columns, then the system will suggest you add the Parse Email or Parse Phone Number step.
  • If Semantic First Name is a column, then the system will suggest you add first name standardization in the Standardize Field step.

Each suggestion identifies a recommended step and the column to which it may be applied. If there are no suggestions for the selected column or entity, they won't be grouped or categorized.