Match and group

Data Integrity Suite

Product
Spatial_Analytics
Data_Integration
Data_Enrichment
Data_Governance
Precisely_Data_Integrity_Suite
geo_addressing_1
Data_Observability
Data_Quality
dis_core_foundation
Services
Spatial Analytics
Data Integration
Data Enrichment
Data Governance
Geo Addressing
Data Observability
Data Quality
Core Foundation
ft:title
Data Integrity Suite
ft:locale
en-US
PublicationType
pt_product_guide
copyrightfirst
2000
copyrightlast
2026
Type: Matching and Consolidation step

This step first matches and then groups dataset records based on entity values.

The Match and Group step identifies records that are related to each other in some way. For example, if you are trying to eliminate redundant information from your customer data, you may want to identify duplicate records for the same customer. If you are trying to eliminate duplicate marketing mailings to the same address, you may want to identify records of customers that live in the same household.

You can match based on entities, field values, or combination of entities and fields. An entity is a collection of fields that uniquely characterizes or identifies an object as a person, address, contact, or business.

A matching scenario defines the collection of entities or fields to be used to match and group records. When there is no entity for a set of criteria, you can select fields that distinguish relevant characteristics to match records. Fields and entities within a scenario are combined with a logical AND, where all records must match every field or entity in the scenario to be grouped with other records. Multiple scenarios are combined with a logical OR, where any scenario can be matched to match and include a record in a group.

Note: When you select the Match and Group step in the pipeline, the inspection view evaluates the top 500 records. However, when you select the Preview button while configuring the step, the step preview result evaluates the top 2000 records.

Step name: Defines the name for a step. Provide a meaningful name so that anyone who edits steps in a pipeline will be able to identify the purpose of a step.

Approach

  • Entity based: You define the criteria that match and group records.
  • Automated: The Match and Group step uses artificial intelligence and machine learning to match and group records based on name and location of people or businesses.
  • Custom: Define match scenarios for non-entities. It allows you to define custom match rules and leverage those to handle a wider variety of match scenarios.

Output Fields

  • GroupId: Each suspect record is given a GroupId. The candidates for that suspect are given the same GroupId. For example, if John Smith is a suspect record and its candidate records are John Smith and Jon Smith, then all three records would have the same GroupId.
  • MatchKey: Records that have the same match key are placed into a match group. Records with the same match key are considered potential duplicates. Records with different match keys are not considered duplicates.
  • RecordType: Identifies the type of match record in a collection. The possible values are:
    • Suspect: A record that other records are compared to determine if they are duplicates of each other. Each collection has one and only one suspect record.
    • Duplicate: A record that is a duplicate of the suspect record.
    • Unique: A record that has no duplicates.
  • IsMatched: This value is set to true when the match score is greater to or equal to the Master Threshold setting.
  • MatchScore: This is the probability that a record matches other records, where 0 represents a non-match, and 100 represents a full match. A value falling between 0 and 100 shows the match confidence level.
  • MatchScenario: Shows the name of the scenario that defines a Duplicate record. This is empty for Suspect or Unique records.