The Cleanse Data step enables you to apply multiple string-cleansing operations within a single transformation step in your Data Quality pipeline. This consolidated approach allows you to efficiently clean, standardize, and prepare string fields for downstream processes.
By combining cleansing operations like Trim String, Convert Case, and Replace Values in one step, you can reduce complexity, avoid repetitive configuration, and ensure consistent data handling.
Why use the Cleanse data step?
| Operation | Description | Example |
|---|---|---|
| Replace Values | Replace specific values or substrings with alternatives or blanks. | "N/A" → "" or "Ltd." → "Limited" |
| Convert Case | Standardize text by converting to upper case, lower case, or title case. | “jOHN doe" → "John Doe" |
| Trim String | Remove unwanted characters from the beginning, end, or both ends of a string. | " John Doe " → "John" |
| Get Substring | Retrieves a segment of a string starting from a given position and spanning a defined length. | "ABCD1234" → "1234" |
| Pad String | Appends characters to either the start or end of a string field to achieve a defined length. | "123" → "00123" |
| Replace Between | Substitutes a segment of the string located between two designated characters. | "Serial(987)" → "Serial(***)" |
| Replace by Position | Substitutes a portion of a string defined by the starting and ending positions. | "XYZ0012345" → "XYZ0000005" |
| Cleanup Whitespaces | Removes unwanted spaces from text fields. | "Hello World" → "HelloWorld" |
- Add multiple actions of each operation type within a single Cleanse Data step.
- Apply these actions across one or more fields.
- Preview the combined impact of all configured actions.
When to use Cleanse data?
- Preparing unstructured or messy string fields.
- Standardizing values before validation or matching.
- Removing inconsistencies caused by case sensitivity or special characters.
- Replacing placeholder terms like "N/A", "Unknown", or similar values.
Add the Cleanse data step
- In the pipeline canvas, click + Add Step.
- Select Cleanse Data from the list.
- The Cleanse Data step is added and opens in the configuration panel.
Adding and Configuring Actions
- In the Cleanse Data panel, click + Add Action.
- Choose an operation from the dropdown:
- A configuration panel appears based on the selected operation.
Managing actions
After applying actions, they appear in the Actions list in the Cleanse Data panel.
- Edit an action by clicking the pencil icon.
- Delete an action using the trash icon.
- Add more actions at any time using + Add Action.
Actions are executed in the order they are listed.
Previewing changes
The Cleanse Data step provides a preview of how your data will appear after all configured actions are applied. This preview helps you confirm that all cleansing operations produce the intended results before publishing the pipeline.
Best practices
- Use descriptive action names to clarify each action’s purpose.
- Group related operations together, for example, trim and convert case actions on the same field.
- Avoid overlapping rules (e.g., trimming a value you later replace).
- Test complex replacements with a small preview set before applying broadly.
- Preview changes before saving to ensure accuracy.