The entity-based approach in data quality software utilizes match keys and matching scenarios to identify and group duplicate records efficiently. This method involves configuring algorithms, selecting fields, and setting thresholds to optimize the matching process and ensure data integrity.
Match Key: A match key applies algorithms to selected fields to improve speed and performance for identifying duplicate records. The Match and Group step compares the match key values of the saved record with existing records. Records with the same key are considered potential duplicates and put in the same match group. Records in different match groups are not considered duplicates. An Auto Generated match key is created automatically when you add the Match and Group step to a pipeline. You click the Edit Match Keys button to create or edit match keys that apply specific algorithms to selected fields. For more information about match key options, see Match Key options for Entity Based approach.
Matching Scenario:
- Select fields and entities to match: Select a field or entity to match and group records by the entity or field values. You can select multiple fields and entities to match and group records. You can click the Matching Options button to configure matching options for an entity and its fields. For an entity, you can configure the matching method, how missing data is handled, and the scoring method. For an entity field, you can configure missing data, the threshold score, scoring method, and algorithm settings. For more information see, Matching options.
- Master threshold: Specifies the similarity threshold to determine matching records. This value signifies the minimum score for a record to be considered as a suspect match with other records between 0 and 100 percent. You can adjust this value to minimize both false positives and false negatives. A higher value is more likely to result in false negatives while a lower value is more likely to result in false positives. The default value is 75.
- Add match scenario: Click the Add Match Scenario button to add a match scenario to the step.