Consolidate matches

Data Integrity Suite

Product
Spatial_Analytics
Data_Integration
Data_Enrichment
Data_Governance
Precisely_Data_Integrity_Suite
geo_addressing_1
Data_Observability
Data_Quality
dis_core_foundation
Services
Spatial Analytics
Data Integration
Data Enrichment
Data Governance
Geo Addressing
Data Observability
Data Quality
Core Foundation
ft:title
Data Integrity Suite
ft:locale
en-US
PublicationType
pt_product_guide
copyrightfirst
2000
copyrightlast
2025
Type: Matching and Consolidation step

The Consolidate Matches step is used to consolidate inconsistent duplicate records by either creating a new golden record or replacing them with consistent records.

Enterprises have multiple versions of the same business records from different sources. These records are inconsistent and therefore unreliable. It is required to identify these duplicate records for a given business field (customer/organization/address) and then combine these duplicates to obtain the most reliable record. The accuracy of the resultant record is determined by the confidence level, which depends on these factors.

  • Quality of source: refers to the reliability and trustworthiness of the data source itself.
  • Chronology/recency: pertains to the time when it was collected or last updated.
  • Frequency/plurality: frequent updates ensure that the data is up-to-date and reflects the current state of affairs.
  • Completeness: refers to the extent to which all relevant information is present in the dataset.
  • Validity: refers to whether the data accurately represents the real-world entities or events it intends to describe.
  • Business rules: specific rules or guidelines that dictate how data should be handled and managed.

The Consolidate Matches step provides two approaches—best of breed and commonize.

Approach

Best of breedThe best of breed approach uses rules to create a golden record.

CommonizeThe Commonize approach determines which fields from a collection of records to copy to the corresponding fields of all records in the collection.

Save

Click this button to close settings and save changes to the transformation settings.

Preview

Click this button to preview the results of the transformation settings.

Cancel

Click this button to close settings for this transformations without saving any changes.

Best of breed approach

The best of breed approach uses rules to create a golden record.

It consolidates duplicate records by selecting the best data in a duplicate record collection and creating a new consolidated record using the best data. You can define multiple conditions and actions for each rule as needed. When processing completes, the best of breed record is retained by the system.

The Best of Breed rule ensures data accuracy and reliability. It protects data-driven processes, eliminates inaccuracies, and aids in decision-making. Data professionals implement the rule and evaluate records based on predefined criteria. It applies in data warehouses, CRM systems, and more. The rule compares records and retains the highest-quality entry as the survivor record also called as the single source of truth record.

Group by

Specifies the column in your dataset that you want to use as the basis for grouping. For example, you might have a dataset of sales data with columns like product, region, and sales_amount. If you want to apply the best of breed rule to find the highest sales amount for each product, you would group the data by the product field.

Sorts data within the group based on the selected Group by column. The order of records impacts the selection of the golden record when you use the first record as the starter record. The default sort may not yield the expected order, and it may select a starter record that is unhelpful, potentially impacting your data decisions. When you select Sort within group, it arranges the fields in the selected Group by column according to your specified sorting order.

Keep original records

Allows you to retain your input records along with the best of breed record. Not selecting the checkbox will remove other records and retain only the best of breed record.

Starter Record

The record that serves as the initial or reference point for defining how consolidation rules should be applied to other similar records. You can choose the starter record based on certain criteria, such as its quality, accuracy, or completeness.

First record

The very first data entry you choose as a starting point to establish guidelines for data consolidation. When using the first record as the starter record, you are essentially selecting one record as a reference and then defining consolidation rules based on the attributes or characteristics of that particular record.
Remember: While using the first record as the starter record, it's important to ensure that the first record truly represents accurate and high-quality data. It's also essential to consider scenarios where the first record might be an outlier or not fully representative of the dataset.

Starter record conditions

Specifies conditions that are used to determine which record should be selected as the starter record for defining consolidation rules. These conditions help identify the initial reference point from which consolidation rules are established to determine the golden record.

Consolidation Rules

Specifies the rules that help select the most accurate, relevant, or highest-quality data from multiple sources. Consolidation rules help determine the best or most reliable record from the data that has inconsistencies, duplicates, or conflicting information.

Add Consolidation Rules

Allows you to add multiple conditions and actions that you want to define as the best of breed rule. You can add more than one consolidation rules. You can also edit or delete an existing rule if needed.

Add consolidation rules

Conditions

Specifies single or multiple conditions to set rules. You can add group of conditions for a more complex rule. The operators are shown based on the column type selected.

Add Condition

Lets you add conditions to the condition group.
Note: If you have added more than one condition, you can choose to Match all conditions or Match any condition.

Add Group Condition

Lets you add a new condition group.
Note: If you have added more than one group condition, Satisfy any groups is always applicable.

Actions

Specifies actions for your provided conditions or group of conditions.

Add Action

Lets you add actions for your defined conditions. You can choose to copy the data from selected field or copy from string that you provide, to the field that you want.

Copy From Field: copies the selected field value to the best of breed record.

Copy From String: copies a constant value to the best of breed record.

Apply

Saves the provided conditions and actions.

Note for "Copy from Field": For numerical values, you can select the checkbox Sum Numeric Values. For example, if there are three duplicate records within the group and they have these values in the Deposits field: 100.00, 20.00, 5.00 then the resulting best of breed record's Deposit's field will be 125.00, representing the sum of all three values.

Output field

A new output column—CollectionRecordType is added to your dataset output. Depending on your choice to retain original records, the output will either preserve the original rows alongside the best-of-breed record, or the input rows will be replaced by the best-of-breed record. The output column may include the following rows:
  • Primary: This represents the selected template record within the collection of records.
  • Secondary: This denotes a record that is not the chosen template within the collection of records.
  • BestOfBreed: This signifies the newly created best-of-breed record within the collection of records.
    Note: The Primary and Secondary values are generated only when you define Starter record conditions.

The output field is created based on the action defined for a particular condition or group condition. You can use Copy From Field under Actions to copy the content of a specific column to a new column using the consolidation rules.

Commonize approach

The Commonize approach determines which fields from a collection of records to copy to the corresponding fields of all records in the collection.

You have to specify rules that records must satisfy to copy the field data to the other records in the collection. When processing completes, all records in the collection are retained.

The Commonize rule ensures standardization and unification of data from diverse sources. It aims to create a consistent and homogeneous dataset, preventing discrepancies. It applies to data integration, data consolidation, and data cleansing. Having consistent data, enhances data quality, and enables better decision-making processes, fostering efficiency and accuracy in their data-centric operations.

Level 1

Specifies the first pass for applying grouping and consolidation rules to the dataset.

Add Level

Allows you to add more levels that can be used to apply different grouping and consolidation rules for each level. The result of level 1 is used as the input for level 2, and so on. You can add multiple levels based on the duplicate or inconsistent fields present in your dataset.

Group by

Specifies the column that you want to use to group your resultant record. This is usually the column by which you group or categorize data items that share common characteristics. This grouping allows you to apply commonizing rules consistently within each group, ensuring that data is standardized in a coherent manner.

Sorts data within the group based on the selected Group by column. The order of records impacts the selection of the commonize record. The default sort may not yield the expected order, and it may select a template record that is unhelpful, potentially impacting your data decisions. When you select Sort within group, it arranges the fields in the selected Group by column according to your specified sorting order.

Consolidation Rules

Specifies the rules that help address issues such as inconsistent naming conventions, formats, or values across different datasets. Consolidation rules ensure that data is consistent and aligned based on predefined standards.

Add Consolidation Rules

Allows you to add multiple conditions and actions that you want to define as commonizing rule. You can add more than one consolidation rules. You can also edit or delete an existing rule if needed.

Add consolidation rules

Conditions

Specifies single or multiple conditions to set rules. You can add group of conditions for a more complex rule. The operators are shown based on the column type selected.

Add Condition

Lets you add conditions to the condition group.
Note: If you have added more than one condition, you can choose to Match all conditions or Match any condition.

Add Group Condition

Lets you add a new condition group.
Note: If you have added more than one group condition, Satisfy any groups is always applicable.

Actions

Specifies actions for your provided conditions or group of conditions.

Add Action

Lets you add actions for your defined conditions. You can choose to copy the data from selected field or copy from string that you provide, to the field that you want.

Copy From Field: copies a value from the selected field to the other records in the group.

Copy From String: copies a constant value to the other records in the group.

Apply

Saves the provided conditions and actions.

Output field

The output field is created based on the action defined for a particular condition or group condition. You can use Copy From Field under Actions to copy the content of a specific column to a new column using the consolidation rules.

Consolidation rule operators

Use these operators to define conditions within a condition group for the consolidation rules.

Numeric field operators

Use numeric operators to evaluate values in Integer, Long, Float, Date, DateTime, and Time fields.

Operator Expression true when
Equal Field value is equal to a literal or field value.
Not equal Field value is not equal to a literal or field value.
Highest Field value is the highest than any other field values.
Lowest Field value is the lowest than any other field values.
Greater than Field value is greater than a literal or field value.
Less than Field value is less than a literal or field value.
Greater than or equal Field value is greater than or equal to a literal or field value.
Less than or equal Field value is less than or equal to a literal or field value.
Empty Field value contains no value.
Not Empty Field value contains any value.

String field operators

Use string operators to evaluate values in String fields.

Operator Expression true when
Equal Field string value matches a literal or field value.
Not equal Field string value does not match a literal or field value.
Contains Field string value contains a literal or field value.
Empty Field value contains no value.
Not empty Field value contains any value.
Longest Field string value has the longest (in bytes) value in the field.
Shortest Field string value has the shortest (in bytes) value in the field.
Most Common Field string value that occurs most frequently in this field among the records in the group. If two or more values are most common, no action is taken.