Join

Data Integrity Suite

Product
Spatial_Analytics
Data_Integration
Data_Enrichment
Data_Governance
Precisely_Data_Integrity_Suite
geo_addressing_1
Data_Observability
Data_Quality
dis_core_foundation
Services
Spatial Analytics
Data Integration
Data Enrichment
Data Governance
Geo Addressing
Data Observability
Data Quality
Core Foundation
ft:title
Data Integrity Suite
ft:locale
en-US
PublicationType
pt_product_guide
copyrightfirst
2000
copyrightlast
2025
Type: Structure step

Joining multiple inputs in a Data Quality pipeline allows you to relate different datasets to enrich records or compile a complete set of records from various sources.

Join operations in a pipeline enable the merging of records from multiple inputs based on specified keys, creating a single, coherent output view. This feature supports Inner, Left, Right, and Full joins, allowing users to define the nature of the relationship between the tables.

For example, a retail company may want to join customer transaction data with customer support records to provide a holistic view of customer interactions and purchasing behaviors. The transaction dataset contains purchase details, while the support dataset contains customer inquiries and issues. By joining these datasets on a common customer ID field, the company can analyze the data to improve customer service and tailor marketing strategies.

Suppose we have two datasets: one containing transaction data and the other containing customer support records.

Table 1. Transaction dataset
Customer ID Transaction ID Product Purchased Purchase Amount
001 1001 Product A $50
002 1002 Product B $30
003 1003 Product C $70
004 1004 Product A $40
005 1005 Product D $60
Table 2. Support Dataset
Customer ID Support Ticket ID Issue Resolution
001 5001 Shipping Delayed Refunded
002 5002 Missing Parts Replaced
003 5003 Product Quality Replacement
004 5004 Wrong Item Refunded
005 5005 Product damaged Exchange

To provide a holistic view of customer interactions and purchasing behaviors, we can join these datasets on the common field "Customer ID". After joining, we might get a combined dataset like this:

Table 3. Combined dataset
Customer ID Transaction ID Product Purchased Purchase Amount Support Ticket ID Issue Resolution
001 1001 Product A $50 5001 Shipping Delayed Refunded
002 1002 Product B $30 5002 Missing Parts Replaced
003 1003 Product C $70 5003 Product Quality Replacement
004 1004 Product A $40 5004 Wrong Item Refunded
005 1005 Product D $60 5005 Product damaged Exchange

In this combined dataset, each row represents a customer's transaction along with any associated support tickets, providing a holistic view of both their purchasing behaviors and interactions with customer support.

To append multiple inputs.

  1. While building the pipeline, select Add Step on the canvas where you want to branch.
  2. In the Add Step dialog, select Join.
    Note: The join operator can only be applied to the open end of a branch, not to any point within the branch.
  3. Choose the datasets to join by selecting the left and right inputs in the properties panel.
  4. Select the datasets from either the branches in the current pipeline or from cataloged datasets.
  5. Define the join keys by mapping corresponding fields from both inputs.
    • Select the fields to be used as join keys from the dropdown lists.
    • Join keys are the fields based on which the datasets will be merged.
  6. Choose the type of join—Inner, Left, Right, or Full—according to the desired output.
  7. Click Configure Output to define the output view based on the input columns and the type of join.
    Tip: You can include additional fields such as Record Count, Source Identifier, and Source Record Counter as part of the output.
  8. Preview the join result by clicking Preview.
    Note: It displays all columns from both the left and right input tables. The columns are shown in the order they appear, starting with the join key columns from both the left and right fields.
  9. If the preview is satisfactory, click Save to add the join step to the pipeline.

The datasets are joined according to the specified keys and join type, resulting in a unified dataset for further processing. Review the joined data and proceed with additional transformations or analyses as required.

Step properties

Left Input: The primary dataset to be joined. Select the dataset that will act as the main reference for the join operation.

Right Input: The secondary dataset to be joined. Choose the dataset that will be combined with the primary dataset based on the join keys.

The fields used to match records from both datasets. Specify which fields should be used to align records from the left and right inputs.

  • Left field: Specifies the key field in the left input dataset. This field will be matched against the corresponding key in the right dataset.
  • Right field: Specifies the key field in the right input dataset. This field will be matched against the corresponding key in the left dataset.

Add Join Key: Adds additional fields to append. Use this to specify more fields that should be used to determine matches between the two datasets.

Join type: Specifies the type of join operation to be performed, which determines how records are matched and combined.
  • Inner: Keeps only records with matching keys in both datasets, resulting in a dataset containing only the intersection of both inputs.
  • Left: Keeps all records from the left dataset and matching records from the right dataset, filling in with nulls where there is no match.
  • Right: Keeps all records from the right dataset and matching records from the left dataset, filling in with nulls where there is no match.
  • Full: Keeps all records from both datasets, filling in with nulls where there are no matches, effectively creating a union of the datasets.
  • All fields: Includes all fields from both datasets in the output, providing a comprehensive view of combined data.
  • Left input: Includes only fields from the left input dataset in the output, focusing on the primary datasets structure.
  • Right input: Includes only fields from the right input dataset in the output, focusing on the secondary datasets structure.
    Note: If a field name exists in both tables, the system automatically adds a _Right suffix to the field name from the right table (this is case-insensitive). Users can update or add new field names in the Rename column. To configure output columns, simply select the check-box next to the desired field names.

Select the output fields that you want to include in joined dataset