The Union step combines data into a new row. Common rows are considered only once and are appended with the mapped columns.
For example, consider a retail company that manages customer and supplier information. They store these details in separate datasets: one for customer information and another for supplier information. By mapping common fields such as ContactName, Address, City, PostalCode, and Country from both datasets, the company can create a unified view that combines customer and supplier information. They can easily see which contacts are common between customers and suppliers or streamline communication by having all contact information in one place.
| CustomerID | CustomerName | ContactName | Address | City | PostalCode | Country |
|---|---|---|---|---|---|---|
| 1 | Alfreds Futterkiste | Maria Anders | Obere Str. 57 | Berlin | 12209 | Germany |
| 2 | Ana Trujillo Emparedados y helados | Ana Trujillo | Avda. de la Constitución 2222 | México D.F. | 05021 | USA |
| 3 | Antonio Moreno Taquería | Antonio Moreno | Mataderos 2312 | México D.F. | 05023 | Mexico |
| SupplierID | SupplierName | ContactName | Address | City | PostalCode | Country |
|---|---|---|---|---|---|---|
| 1 | Exotic Liquid | Charlotte Cooper | 49 Gilbert St. | London | EC1 4SD | UK |
| 2 | New Orleans Cajun Delights | Maria Anders | Obere Str. 57 | Berlin | 12209 | Germany |
| 3 | Grandma Kelly's Homestead | Regina Murphy | 707 Oxford Rd. | Ann Arbor | 48104 | USA |
The resultant table—append Customer information and Supplier information.
| ContactName | Address | City | Postal Code | Country |
|---|---|---|---|---|
| Maria Anders | Obere Str. 57 | Berlin | 12209 | Germany |
| Ana Trujillo | Avda. de la Constitución 2222 | México D.F. | 05021 | USA |
| Antonio Moreno | Mataderos 2312 | México D.F. | 05023 | Mexico |
| Charlotte Cooper | 49 Gilbert St. | London | EC1 4SD | UK |
| Regina Murphy | 707 Oxford Rd. | Ann Arbor | 48104 | USA |
In this combined dataset, each row represents either a customer or a supplier's contact information. This unified view enables the company to manage and analyze their business relationships more effectively.
To perform the union of multiple inputs:
- While building the pipeline, select Add Step on the canvas where you want to branch.
- In the Add Step dialog, select Union.Note: The union operator can only be applied to the open end of a branch, not to any point within the branch.
- Configure the step properties.
- Click Preview to review the result.
- After you have reviewed the preview result, click Save to add the step to the pipeline.
Add an output step to each open input and set up the run configuration to generate output.
Step properties
- Input Dataset 1: Specifies the input dataset used as a reference for the union.
-
Add Input: Adds an input dataset for the union.
- This flow tab: Lists all open inputs that can be used in the union. Select the dataset that you want to add as input to the union.
- Map Fields: Specifies the input fields that are mapped to the dataset used in the union. Fields are auto-mapped by default.
Output configuration
Select the output fields that you want to include in the appended dataset.
Output: On selecting Map Fields, select the output fields that you want to include in the appended dataset.
Input 1: Lists fields from the first input dataset. Use the dropdown to select the field that you want to map.
Input 2: Lists fields from the second input dataset. Use the dropdown to select the field that you want to map.
You must map the field from the first input to the field from the second input.