Configure transformation steps

Data Integrity Suite

Product
Spatial_Analytics
Data_Integration
Data_Enrichment
Data_Governance
Precisely_Data_Integrity_Suite
geo_addressing_1
Data_Observability
Data_Quality
dis_core_foundation
Services
Spatial Analytics
Data Integration
Data Enrichment
Data Governance
Geo Addressing
Data Observability
Data Quality
Core Foundation
ft:title
Data Integrity Suite
ft:locale
en-US
PublicationType
pt_product_guide
copyrightfirst
2000
copyrightlast
2025

Discover seamless data transformation and execution in our Data Integrity Suite's data quality pipeline. Learn how to effortlessly enhance data accuracy, rectify errors, and automate processes for reliable insights.

Add transformation steps

Transforms are configured to form transformation steps in a pipeline. These steps change the format, structure, or values of data uploaded by a dataset. A pipeline may perform any number of transforms in the order determined by the order of the transformation steps.

  1. On the main navigation menu, click Quality > Pipelines.
  2. In the Name column, find the pipeline that you want to edit.
    In the Search box, you can enter all or part of a pipeline name to filter pipelines shown in the table.
    Tip: In the Dataset column, you can scroll down to find the dataset for a pipeline.
  3. In the Name column, click the ellipsis, then click Edit.
    Alternatively, you can click the pipeline name to edit the pipeline.
  4. Choose a method to add a step to a pipeline.

    Append a step to a new or existing pipeline: Click the Add Step button located after the last step in any pipeline.

    Insert step before an existing step: Hover over the step, click the shortcut menu button, and click Add Step > Before.

    Insert step after an existing step: Hover the pointer over the step, click the shortcut button, and click Add Step > After.

    Insert a step between two existing steps: Hover the pointer over the connection between two steps, and click the add step button that appears on the connecting line.

    Add a step to the start or end of an existing pipeline: Hover the pointer over a step in the pipeline, click the shortcut button, then click either: Add Step > To Start or Add Step > To End
    Note: These commands do not appear on the first or last step in a pipeline.
  5. In the Add Step dialog box, click the transformation that you want to add to the pipeline.
    Completing this step displays options for the selected transformation.
  6. In the Step name box, you can edit the name to describe this transformation step.
  7. Configure the remaining options for the transformation.
    Options other than Step name depend on the selected transformation. For more information about specific transformations, see Data Quality pipeline steps reference.
  8. Optional: Click the Preview button to view affected column values before and after the transformation.
    Optionally, you can select the Transformation Preview check box to view sample data in the affected columns before and after the operation.
  9. Click Save.
    This adds the transform to the selected location in the pipeline.
  10. Optional: You can drag a transformation step left or right to change the order in which it is performed.
    Alternatively, you can click the vertical ellipsis on the top-right and select Reorder > Move To > Start, Right, Left, End.

View errors in the Data Quality pipeline

After you click the Confirm button to save changes, the affected steps show the error and warning symbols in the confirmation dialog box. A red box shows the total number of errors and warnings in the pipeline.

To view errors:

  1. Click a warning or error symbol on a step to view the list of errors that you need to fix for the step.
  2. To view errors and warnings for all steps in a pipeline, click error and warning count box to expand the Errors and Warnings list panel.
    This panel shows errors and warnings for every step in the pipeline. This feature is especially useful for a complex pipeline with many steps.
  3. Scroll the list and click any step label on this panel to expand and edit settings for a step.
When a step displays the error symbol, you cannot view the outcome of the step or any subsequent steps. Should you select the Transformation Preview checkbox to view the affected columns, you will find that they no longer show any sample data. When a step displays the warning symbol, you can view the outcome of the step, but only if it is not preceded by any steps that display the error symbol.

View data errors

As with errors and warnings, a box on the top right corner of the pipeline panel shows the total number of data errors in the sample dataset.
  1. Click the box to expand the Issues panel.
    The Issues panel shows the total number of issues, and in the Data Errors section it identifies and shows the count for each type of error.
  2. For each type of error, use the directional control buttons to move the focus between affected fields.

Group transformation steps

An ability to group steps in a pipeline. You can add steps to the group or can create groups within groups.

While cleansing complex data with a large number of steps involving various functions, it can be challenging to organize your data quality pipeline. Grouping steps logically can help you efficiently manage the pipeline. You can add a group from Add Step in a data quality pipeline.

You can simplify your pipeline by grouping steps based on their functions. This provides better control and a more focused view of your pipeline, allowing you to easily navigate through the steps.

For example, if you have a dataset containing customer information such as names, contact information, addresses, and you want to clean this data using multiple steps, you may use parsing steps to clean customer information and general steps to amend certain fields. Then, you may want to use addressing steps to gain addressing insights and add location-based insights. Finally, you can use an enrich step to enhance your data. With this approach, there are a significant number of steps involved in curating your dataset, each serving a different function based on the desired result. To simplify your pipeline, you can group steps with similar functions into a single group. These groups provide clarity about the transformations used in the pipeline and help you manage your transformations more efficiently, eliminating the need to scroll horizontally back and forth to navigate through pipeline steps.

You can also create groups with single or multiple conditions. For a conditional group, the steps added to the group will be applied only to the fields that satisfy the group's condition.

For instance, consider a scenario where you are familiar with your dataset and know which fields require transformations. In this case, when you create a group with a specific condition, the steps added to that group will be applied exclusively to the fields that meet the group's condition. You can add multiple parsing steps to a group with a condition set as Name EQUAL Literal John. In this scenario, all parsing steps in the group will be applied only to fields with the name John.

You can also select multiple steps and perform actions on a collection of steps rather than performing actions on individual steps one by one.

With a group, you can perform the following actions:

  • create multiple groups
  • select steps and move them to a group or add them to a new group
  • Update or delete groups
Considerations while working with grouping:
  • A group must contain at least one step.
  • You can double-click to open a group and drill down into a focused state.
  • You can use drag-and-drop action to reorder steps or groups.
  • The data of previous step or a group is used as the input data for the next step or a group.
  • The output preview shown for a group is the preview of last step in the group.
  • You cannot group both a step and a group together into another group.
  • You cannot group steps that are located on the left and right sides of a group.
Note: When you are working inside a group, you must use the last operator in that group to configure the step and investigate results.

Create a new group

  1. On the pipeline or within a group, hover over the step and select the checkbox.
  2. On the top-right of the canvas, select Group.
  3. Optional: Add a group condition. Right-click on the group and click Edit.

    When a group condition is set, the transformation steps within that group apply only to fields that meet the condition.

  4. Click Save.
Add steps to the group.

Add steps to a group

  1. Hover over the step within the group and click the Add Step icon.
  2. Select the step that you want to add.
The step is added to the selected group.
Configure the added step.

Select multiple steps or groups

Allows you to select multiple steps or groups to perform a set of action.

  1. Point to the step or group and on the top-left corner of the item, select the checkbox.
  2. On the top-right of the page, you can perform these actions,
    Click Group to group all the selected items together.

    Click the vertical ellipsis to access available actions.

Move steps or groups to a group

Point to the step or group that you want to move and select the checkbox on the top-left corner. You can select multiple steps or group at a time.
  • On the top-right, click the vertical ellipsis to access available actions.
  • Use drag-and-drop actions to move steps or groups into another group. Ensure the target group is expanded where you intend to place the selected items.

The items are moved to the selected group.

Group existing steps or groups in a pipeline

  1. Point to the steps or groups that you want to group together and select the checkbox on the top-left corner.
  2. On the top-right of the canvas, select Group.
A new group is created with the selected items.

Edit a group

  1. Select the group that you want to edit.
  2. On the top-right of the canvas, click Edit.
A right pane opens allowing you to edit the group.

Delete a group

Warning: Delete group action cannot be undone.
  1. Select the group that you want to delete.
  2. On the top-right, click the vertical ellipses and select Delete.
The selected groups are deleted.

Other available options are:

Depending on your selection—whether it's a step or a group—these actions are displayed. Click the vertical ellipsis at the top-right.

  • Add step: Adds a new step.
  • Reorder: Moves the selected items as per your selection.
  • Select All: Selects all visible items in the pipeline.
  • Expand/Collapse Group: It expands or collapses a group to show or hide steps within the group.
  • Ungroup: It breaks down the group into its constituent steps.

Apply conditions to transformation step or a group

Apply transformations to targeted fields in the dataset by using conditions within a step or a group.

Set conditions on a transformation step or a group in a data quality pipeline. This is an ideal way to apply the transformation on the fields that satisfy the condition. Make sure you are well versed with your dataset to be able to target the fields that you want to transform. Use conditional transformation to obtain resultant output as per the applied conditions. The output table includes the resultant transformed field including the fields that do not have any conditions selected.

For example, you are using Parse Email and you want to parse emails only for users from a specific state. In this case, you can apply condition on the Parse Email step as State column Equal condition Literal value as New York. Yon can either enter a literal value or select a field. This will output the result as parsing emails only for users having state as New York.

You can also apply conditions to a group of transformation steps. In this case, it will output the result for each transformation based on the condition applied to the group.

The available conditional evaluation methods are basic and advanced.
  • Basic conditions are used when you want to perform straightforward data evaluation based on simple conditions. While basic conditions are efficient for simple tasks, more complex scenarios that involve multiple conditions, logical operations, or advanced data transformations may require the use of advanced conditions or custom queries.
  • Advanced conditions are used in more complex data evaluation scenarios when basic conditions are insufficient to meet your specific data transformation needs. These conditions provide greater flexibility and capability to work with complex data.

The choice between basic and advanced conditioning depends on the specific requirements of your data.

Considerations for applying conditions to transformation step or group:
  • You can generate conditional output for all transformation steps, except these: Copy Field, Filter Field, Rename Field, and Filter Row.
  • If a group of transformation steps contains one or more steps that do not support conditions, applying conditions to the group will result in conditional output based on the conditions applied to the group.
  • If you have conditions on both, individual steps and on a group of steps, the group condition will intersect with all individual step conditions using the AND operator. For example, if the group has the condition Age = '30' and a step within the group has the condition FIRSTNAME=ucase('AMY'), the overall condition will return results only for Age = '30' AND FIRSTNAME=ucase('AMY').
  • If you switch from typing the advanced condition to the basic condition, your advanced conditions will be lost.

To generate conditional transformation output:

  1. Create a new pipeline or open an existing one.
  2. Open the transformation step or the group to which you want to apply the condition.
  3. On the right pane, select the Condition tab.
  4. Select the Basic or Advanced condition.
  5. Select the condition for basic evaluation or type the condition expression for advanced evaluation.
  6. Click Save to apply your changes.

Basic condition

Basic condition is used when you want to perform data evaluation based on simple conditions.

You can provide conditions in two ways:
  • Select the combination of <field>, <operator>, <literal> and type a value that should match the condition.
  • Select the combination of <field>, <operator>, <field> and select a field that should match the condition. In this case, you can compare the selected field with field of same data type. You can use this for all available data types.

The available operators depend on the data type of the selected field.

For example, the condition, FirstnameequalJoe, where Firstname is the field, equal is the condition, and the literal value is Joe, will apply the transformation to fields that has Firstname value equal to Joe.

Numeric and temporal field operators

Use numeric operators to evaluate values in Integer, Long, Float, Date, DateTime, and Time fields.

Operator Binary/Unary Expression true when
Equal Binary Field value is equal to a literal or field value.
Not equal Binary Field value is not equal to a literal or field value.
Null Unary Field value is null.
Not null Unary Field value is not null.
Greater than Binary Field value is greater than a literal or field value.
Greater than or equal Binary Field value is greater than or equal to a literal or field value.
Less than Binary Field value is less than a literal or field value.
Less than or equal Binary Field value is less than or equal to a literal or field value.

String field operators

Use string operators to evaluate values in textual fields.

Operator Binary/Unary Expression true when
Equal Binary Field string value matches a literal or field value.
Not equal Binary Field string value does not match a literal or field value.
In Binary Field string value matches any one of two or more literal values. (Type and press Enter to add each literal value to the box.)

Logical field operators

Use logical operators to check values in fields that represent either true or false conditions.
Operator Binary/Unary Expression true when
Equal Binary Field value is equal either to a literal True or False value or to the value in another Boolean field.
Null Unary Field value is null.
Not null Unary Field value is not null.

Advanced condition

Advanced condition is used for complex evaluation operations. You can type the expression using the available operators and functions. The advanced condition is also called as "Expression Builder" since you can build your own complex expressions. The syntax is color coded for better understanding while typing expression. An error is flagged when you mistype or misuse an operator or a function.
Note:
  • When using advanced conditions, it is mandatory to use left and right operands along with the operator.
  • Field names are case sensitive while using advanced conditions. Incorrect field name casing does not flag and error in the expression and will not apply the set condition.

    For example, in ucase(FirstName)=true, the condition is applied when you have FirstName field in your dataset. In ucase(FIRSTNAME)=true, the condition is not applied if your dataset does not have FIRSTNAME field and also no error is flagged in the expression.

Operators and their description

Operator Description
= (equals)

Selects records where a field's value is equal to a specified value. For example, Age = '30' selects records where the age is 30.

!= (not equal to)

Selects records where a field's value is not equal to a specified value.

> (greater than)

Selects records where a field's value is greater than a specified value. For example, Price > '50' selects records with prices higher than 50.

< (less than)

Selects records where a field's value is less than a specified value.

>= (greater than or equal to)

Selects records where a field's value is greater than or equal to a specified value.

<= (less than or equal to)

Selects records where a field's value is less than or equal to a specified value.

IN

Selects records where a field's value matches any value in a specified list. For example, City IN ['New York', 'Los Angeles'] selects records with cities matching either "New York" or "Los Angeles."

IS NULL Selects records where the specified field has a null or undefined value. For example, Phone Number IS NULL, if you have a database of customer information and you want to find all records where the "Phone Number" field is not filled in, you would use the "IS NULL" condition on the "Phone Number" field. Phone Number IS NULL
IS NOT NULL Selects records where the specified field has a defined (non-null) value. For example, Email IS NOT NULL, if you want to find all records in a dataset where the "Email" field has been filled in (i.e., it is not null), you would use the "IS NOT NULL" condition on the "Email" field.
AND

Logical operator that combines multiple conditions and returns results that satisfy all of them. For example, you may select data where Age > '25' AND Salary > '50000'.

OR

Logical operator that combines conditions and returns results that satisfy at least one of them. For example, you may select data where Category = 'Electronics' OR Price < '100'.

!

Logical operator that negates a condition, returning results that do not meet the specified criteria. For example, State != 'New York', you may select data where you do not have the "State" as "New York".

Functions and their description

Function Description Usage Example
ucase

Selects a string with uppercase (capital) letters.

Standardize the case of characters in a text field.

FIRSTNAME=ucase('AMY')
lcase Selects a string with lowercase (small) letters.

Standardize the case of characters in a text field.

LASTNAME=lcase('stamp')
tcase Selects a string with title case or with uppercase of the first letter of each word.

Update the text in a string to title case.

tcase('Mrs Mary Smith')=Name
trim Removes leading and trailing spaces from a string. Clean up data, especially when dealing with user-inputted text that may have unnecessary spaces. trim('routley')=LastName
ltrim Removes leading (left-side) spaces from a string. Clean data, ensuring consistent formatting by eliminating leading spaces. ltrim('AMY')=FirstName
rtrim Removes trailing (right-side) spaces from a string. Standardize data by eliminating trailing spaces that might affect comparisons or visual representation. rtrim('stamp')=LastName
lpack Left-packs a string, removing leading spaces and adjusting remaining characters to the left. Align text to the left while eliminating unnecessary leading spaces, aiding in data formatting. lpack('AMY')=FirstName
rpack Right-packs a string, removing trailing spaces and adjusting remaining characters to the right. Align text to the right, removing trailing spaces for consistent formatting. rpack('stamp')=LastName
mask Applies a mask to a string, replacing certain characters with specified placeholders. Helps verify the datatypes present in the generated pattern. For example, if the pattern generated for the input is AAANN, this indicates that A in the pattern contains alphabets and N represents numeric values. mask(Country)='AAAANN'
substr

Extracts a substring from a given string based on specified starting position and length.

Extract specific portions of text from a longer string. Two ways of building expression: substr(string data, int beginIndex) and substr(string data, int beginIndex, int endIndex)

FIRSTNAME=substr('TAmy',2)

FIRSTNAME=substr('TAmyT',2,5)

concat Joins two or more strings without space. Combine information from multiple fields or sources into a single, cohesive string. concat('AMY',' ')=FirstName
regexp Applies a regular expression pattern to a string for pattern matching. Complex pattern recognition and extraction in textual data. regexp(Email,'@.')='@g'
replace Replaces occurrences of a specified substring or pattern in a string with another substring. Clean and update data, replacing specific elements with desired values. replace('Mircosoft','rco','cro')=Company
split Splits a string into an array of substrings based on a specified separator character. Separate of text data into individual components for analysis or processing. split('Sweden:BasicCountry',':',0)=Country
length

Selects a string having the specified length (number of characters).

Know the size of a string, often used for validation.

length(CollectionNumber) > 1
to_number Selects a numeric string having a specified numerical value. Transform string representations of numbers into numeric formats for mathematical operations. to_number('100')=CustomerId
to_int Selects double value as per specified integer value. Maintain data consistency and facilitate numerical calculations. to_int(22.53)=Currency
occstr Returns the position of the first occurrence of a specified substring in a string. Locate specific substrings within text data. occstr(Email,'.','L')=3
now Selects the current date and time. Provide a dynamic timestamp for real-time data capture or time-sensitive operations. now()=Totaltime
isalpha Selects a string containing only alphabetic characters. Verify if a text field consists solely of letters. isalpha('test')
isnumeric Selects a string containing only numeric characters. Facilitate validation of numeric data integrity in text fields. isnumeric('100')!=CustomerId
isnull Selects fields with the absence of any value. Check fields for missing or unknown data. isnull(NAME)
isempty Selects an empty string. Check that text fields are not null or empty. isempty('')!=PostalCode
contains

Selects a string containing a specific substring.

Search for a particular set of characters within a larger string.

contains('@', Email) != 0
to_date Selects the specified ISO format date (YYYY-MM-DD). Transform date representations in string format for date-related operations. to_date('1973-06-05') = Date
to_time Selects the specified ISO format time (HH:MM:SS). Transform time representations in string format for time-related calculations. to_time('09:04:43')=Time
to_datetime Selects the specified ISO format datetime (YYYY-MM-DDTHH:MM:SS). Integrate date and time information from string formats into datetime values for comprehensive temporal analysis. to_datetime('1973-06-05T09:04:43')=DateTime

After applying the condition, click Preview to review the results. If the results meet your expectations, click Save to preserve the conditions for the transformation step. The output will then be generated based on your selected transformation step and conditions.

AI Assist

When setting up a condition for a Data Quality transformation step, you can now utilize AI Assist to create pass conditions. To use AI Assist in the pipeline, follow these steps:
  1. Configure any transformation step.
  2. Once the properties are configured, navigate to Condition and click on the AI Assist icon.
  3. In the dialog box, you can enter the rule in simple text. For example: Date Created and Date Modified should not differ by more than 3650 days.
  4. Click on Generate to produce the pass condition.
  5. Review the output, and it can now be applied to the pipeline condition.

Add multiple inputs to quality pipeline

Adding multiple inputs to a Data Quality pipeline allows you to include multiple datasets from different sources or with different structures. By merging these inputs, you can process and combine the data as needed.

Add multiple inputs while building your Data Quality pipeline. Add it as a simple or complex input. A simple input uses data directly from a database connection and a complex input requires additional steps to process data before it is ready for merging. The multiple inputs are merged either as union or joins. The output step is added for each input and you set up runtime configurations for each output.

For example, consider a business that operates an e-commerce platform. They collect customer data from various sources, including online registration forms, purchase histories, and customer service interactions. By using multiple inputs, they can combine information from these diverse sources to enrich customer profiles. Merging this data can provide a comprehensive dataset that includes all crucial customer information present in each dataset.
Note: You cannot add multiple connections from different connection types. For instance, it is not possible to merge Databricks tables with Snowflake and running them on Snowflake connection.

To add multiple inputs.

  1. Create a Data Quality pipeline.
  2. On the top-right, click Add Input.
  3. Select the dataset that you want to use to create pipeline.
  4. Select Generate Sample or you can also upload a sample data.
  5. Enter a value for the Number of rows to include in the sample.
    The sample will sequentially retrieve the number of records specified here, starting with the first record in the dataset. You can specify between 1 and 2000 rows. The default value is 100. You may choose to increase this value to capture additional variability from the source data.
  6. Select the fields that you want to include in your sample and click Generate Sample.
  7. Click Select Dataset.
The selected dataset is added to the pipeline.
Continue performing transformations on the data. If you have multiple datasets, you may need to combine them by appending or joining their outputs. Appending means adding rows from one dataset to another, while joining means merging datasets based on common fields. This step ensures that all relevant data is consolidated into a single output dataset.

Select columns in pipeline step

There are several methods using the mouse or keyboard to select columns for a pipeline step. When you edit settings for a transform step, use any of these methods to add columns to the Columns box.

  • In the transformation settings, click to expand the Columns list, and select the check box next to a each column that you want to transform. To clear a selection, clear the check box next to a column.
  • In the sample data table, click a column heading. To clear a selection, click the selected column heading.
  • In the Search box above the sample data table, choose Column, then type a column name in the adjacent Search box. As you type, you can choose from matching columns displayed in a drop-down list.
    To clear the selection on a column, use the same procedure to specify a column name, then either click the Delete button to clear the contents of the Search box, or use the keyboard to delete the contents of the Search box.

Rename step

This procedure describes how to quickly rename a step in a pipeline without editing the step.

  1. In the quality pipeline, point to the step that you want to rename.
  2. click the step name.
  3. Edit the name and press Enter.

Preview step

You can quickly preview the outcome of any step in a pipeline without opening the step settings.

  1. On the main navigation menu, click Quality > Pipelines.
  2. In the Name column, find the pipeline that you want to edit.
    In the Search box, you can match all or part of a pipeline title to filter pipelines shown in the table.
  3. In the Name column, click the ellipsis, then click Edit.
    Alternatively, you can click the name of the step.
  4. On the pipeline pane, click the step that you want to preview.
  5. Turn on Step Preview.
  6. Select any step in the pipeline to view the outcome of a step.
    Turning on Step Preview toggle previews output fields after any step in the pipeline. Click Original Dataset to view the input schema for the pipeline. Click the last step in the pipeline to view the output schema for the pipeline.

Edit transformation step

Edit a transform settings for a step in a pipeline to change its behavior. You may also reorder or delete transformation steps in a pipeline.

  1. On the main navigation menu, click Quality > Pipelines.
  2. In the Name column, find the pipeline that you want to edit.
    In the Search keyword box, you can enter any portion of a pipeline name to only show matching names.
  3. Click the ellipsis in the Name column, then click Edit.
    Alternatively, you can click the pipeline name.
  4. In the pipeline steps, point at the step you want to edit. and click vertical ellipsis on the top-right, and then click Edit.
    Alternatively, you can double-click a pipeline step to edit the transform properties.
    A preview table shows sample data in the selected columns after completion of the transformation step.
  5. Edit the name or other settings for the transform step.
    For more information about transform settings, see Data Quality pipeline steps reference.
  6. Optional: Click the Preview button to view affected column values before and after the transformation.
    You can select the Transformation Preview check box and click Preview to view changes to columns acted upon by the operation.
  7. Click the Save button to save your changes.

You can drag a transformation step to the left or to the right to change the order in which it is performed. Alternatively, you can click the vertical ellipsis on the top-right and select Reorder > Move To > Start, Right, Left, End.

Delete transformation step

  1. On the main navigation menu, click Quality > Pipelines.
  2. In the Name column, find the pipeline that you want to edit.
    In the Search keyword box, you can enter any portion of a pipeline name to only show matching names.
  3. In the Name column, click the ellipsis, then click Edit.
  4. On the transformation steps panel, click the transformation step that you want to delete.
  5. Click the vertical ellipsis on the top-right, then click Delete.
  6. Click Yes to confirm the deletion.