This step first matches and then groups dataset records based on entity values.
The Match and Group step identifies records that are related to each other in some way. For example, if you are trying to eliminate redundant information from your customer data, you may want to identify duplicate records for the same customer. If you are trying to eliminate duplicate marketing mailings to the same address, you may want to identify records of customers that live in the same household.
You can match based on entities, field values, or combination of entities and fields. An entity is a collection of fields that uniquely characterizes or identifies an object as a person, address, contact, or business.
A matching scenario defines the collection of entities or fields to be used to match and group records. When there is no entity for a set of criteria, you can select fields that distinguish relevant characteristics to match records. Fields and entities within a scenario are combined with a logical AND, where all records must match every field or entity in the scenario to be grouped with other records. Multiple scenarios are combined with a logical OR, where any scenario can be matched to match and include a record in a group.
Step name: Defines the name for a step. Provide a meaningful name so that anyone who edits steps in a pipeline will be able to identify the purpose of a step.
Approach
- Entity based: You define the criteria that match and group records.
- Automated: The Match and Group step uses artificial intelligence and machine learning to match and group records based on name and location of people or businesses.
- Custom: Define match scenarios for non-entities. It allows you to define custom match rules and leverage those to handle a wider variety of match scenarios.
Output Fields
-
GroupId: Each suspect record is given a
GroupId. The candidates for that suspect are given the sameGroupId. For example, if John Smith is a suspect record and its candidate records are John Smith and Jon Smith, then all three records would have the sameGroupId. - MatchKey: Records that have the same match key are placed into a match group. Records with the same match key are considered potential duplicates. Records with different match keys are not considered duplicates.
-
RecordType: Identifies the type of match record in a
collection. The possible values are:
- Suspect: A record that other records are compared to determine if they are duplicates of each other. Each collection has one and only one suspect record.
- Duplicate: A record that is a duplicate of the suspect record.
- Unique: A record that has no duplicates.
- IsMatched: This value is set to true when the match score is greater to or equal to the Master Threshold setting.
- MatchScore: This is the probability that a record matches other records, where 0 represents a non-match, and 100 represents a full match. A value falling between 0 and 100 shows the match confidence level.
- MatchScenario: Shows the name of the scenario that defines a Duplicate record. This is empty for Suspect or Unique records.