Use Split to branch a Data Quality pipeline and generate multiple outputs.
When building a Data Quality pipeline and adding transformation steps, you can split the pipeline. This action lets you create multiple branches in the pipeline, directing input records from a single source dataset to multiple outputs. Each branch can have different transformation steps tailored to specific data needs. Usually, a split is added at the end of a pipeline, acting as a broadcaster for the input data. It's important to define the output configuration for each branch separately and map these outputs to dataset tables in the runtime configuration settings.
It's important to know the difference between splitting and grouping or conditional transformation. Grouping or conditional transformation is used with a single dataset by adjusting the conditions that evaluate data based on the set conditions. In contrast, branching divides a dataset into multiple distinct datasets, allowing more flexible data processing.
For example, data is often evaluated as part of a data quality pipeline against business rules. When records fail evaluation, they need to be handled separately from those that pass. Multiple outputs let you segment these failed records for additional processing or handling.
- The name of each branch within the split must be unique.
- There must be at least one branch within a split.
- Groups cannot contain branches.
- Splits cannot be reordered unless they are merged.
- Use the collapse/expand icon on the branch and group to minimize or open the branch or group as needed.
- Reordering a step within the same branch or to a different branch is possible. Use drag-and-drop actions to move steps within a branch or to another branch.
- Each split automatically has two branches by default.
- Deleting a branch on the pipeline deletes its output setting from the run configuration settings.
- When you add a new branch with the same output name tag, you must set up the output for that branch again.
- Each branch within the split will be checked for errors.
To split a Data Quality pipeline:
- Create a Data Quality pipeline.
- Add transformation steps to the pipeline.
- While building the pipeline, select Add Step on the canvas where you want to branch.
- In the Add Step dialog, select Split.Note: The split operator can only be applied to the open end of a branch, not to any point within the branch.
- Provide names for your branches.
- Optional: Select Add Branch to add more branches within the split.
- Select Save to add branches to the split. The branches are added to the Data Quality pipeline.
Continue adding multiple transformation steps as needed. You can add more splits to the pipeline. Finally, add an Output step for each split and then go to to set up run configuration for each branch.