Entity based approach

Data Integrity Suite

Product
Spatial_Analytics
Data_Integration
Data_Enrichment
Data_Governance
Precisely_Data_Integrity_Suite
geo_addressing_1
Data_Observability
Data_Quality
dis_core_foundation
Services
Spatial Analytics
Data Integration
Data Enrichment
Data Governance
Geo Addressing
Data Observability
Data Quality
Core Foundation
ft:title
Data Integrity Suite
ft:locale
en-US
PublicationType
pt_product_guide
copyrightfirst
2000
copyrightlast
2025

The entity-based approach in data quality software utilizes match keys and matching scenarios to identify and group duplicate records efficiently. This method involves configuring algorithms, selecting fields, and setting thresholds to optimize the matching process and ensure data integrity.

Match Key: A match key applies algorithms to selected fields to improve speed and performance for identifying duplicate records. The Match and Group step compares the match key values of the saved record with existing records. Records with the same key are considered potential duplicates and put in the same match group. Records in different match groups are not considered duplicates. An Auto Generated match key is created automatically when you add the Match and Group step to a pipeline. You click the Edit Match Keys button to create or edit match keys that apply specific algorithms to selected fields. For more information about match key options, see Match Key options for Entity Based approach.

Matching Scenario:

  • Select fields and entities to match: Select a field or entity to match and group records by the entity or field values. You can select multiple fields and entities to match and group records. You can click the Matching Options button to configure matching options for an entity and its fields. For an entity, you can configure the matching method, how missing data is handled, and the scoring method. For an entity field, you can configure missing data, the threshold score, scoring method, and algorithm settings. For more information see, Matching options.
  • Master threshold: Specifies the similarity threshold to determine matching records. This value signifies the minimum score for a record to be considered as a suspect match with other records between 0 and 100 percent. You can adjust this value to minimize both false positives and false negatives. A higher value is more likely to result in false negatives while a lower value is more likely to result in false positives. The default value is 75.
  • Add match scenario: Click the Add Match Scenario button to add a match scenario to the step.