Calculation of quality scores - Precisely Data Integrity Suite

Data Integrity Suite

Product
Spatial_Analytics
Data_Integration
Data_Enrichment
Data_Governance
Precisely_Data_Integrity_Suite
geo_addressing_1
Data_Observability
Data_Quality
dis_core_foundation
Services
Spatial Analytics
Data Integration
Data Enrichment
Data Governance
Geo Addressing
Data Observability
Data Quality
Core Foundation
ft:title
Data Integrity Suite
ft:locale
en-US
PublicationType
pt_product_guide
copyrightfirst
2000
copyrightlast
2026

Quality scores in the Data Integrity Suite are calculated using a hierarchical aggregation methodology that combines individual rule scores across fields and datasets. This topic explains how scores are computed at each level and how multiple rules and fields contribute to the overall quality assessment.

Scoring hierarchy overview

Quality scores are calculated in a hierarchical structure that flows from individual rules up to the datasource level. Understanding this hierarchy is essential for interpreting quality metrics across your data assets.

The scoring hierarchy follows this order:

  1. Rule score: The base level score generated when a single rule is run against a field or dataset.
  2. Field score: An aggregate of all rule scores applied to a specific field.
  3. Dataset score: An aggregate of all the quality scores associated with the fields in the respective dataset.
  4. Datasource score: An aggregate of all dataset scores within a datasource.

Quality rule score calculation

A quality rule score is the foundational metric generated when a data quality rule is run. The score represents the percentage of records or data elements that satisfy the rule's pass condition.

Rule score formula:

Rule score = (Number of records passing Rule / Total records evaluated) × 100

Example:

If a completeness rule checks for non-null values in a customer email field and finds that 950 out of 1000 records contain valid email addresses, the rule score is calculated as:

Rule score = (950 / 1000) × 100 = 95%

Rule scores range from 0% to 100%, where 100% indicates that all evaluated records satisfy the rule condition.

Note: When certain functions are used during rule evaluation (such as Not Equal, Less Than, or Less Than or Equal), null records from the database are included in the score calculation. To exclude null values from evaluation, add a secondary condition using the <is not null> operator.

Multiple rules applied to a single field

When multiple quality rules are configured for the same field, the field score is calculated by aggregating all associated rule scores. This provides a comprehensive quality assessment for that field across multiple dimensions or criteria.

Field score formula (Multiple rules):

Field score = Average of all rule scores applied to the field

Example:

Consider a customer name field with three quality rules:

  • Rule 1 (Completeness): Checks for non-null values = 98% score
  • Rule 2 (Validity): Checks for valid name format = 96% score
  • Rule 3 (Consistency): Checks for consistent capitalization = 94% score

The field score is calculated as:

Field score = (98 + 96 + 94) / 3 = 96%

This field score of 96% represents the overall quality of the customer name field across all three quality dimensions.

Tip: All rules are weighted equally in the field score calculation. If you need to prioritize certain rules over others, consider the relative importance of each dimension when interpreting the results.

Multiple fields within a dataset

When a dataset contains multiple fields, each with their own quality scores, the dataset score is calculated by aggregating the scores of all fields within that dataset. This provides an overall quality assessment for the entire dataset.
Tip: Data Integrity Suite calculates the aggregate score for a dataset based on the fields that have an associated quality score. The system does not consider those fields that do not have a quality score when it generates the aggregate score at the dataset level.

Dataset score formula (Multiple fields):

Dataset score = Average of all field scores in the dataset

Example:

Consider a customer dataset with four fields:

  • Customer ID field score = 99%
  • Customer Name field score = 96%
  • Email field score = 92%
  • Phone field score = 88%

The dataset score is calculated as:

Dataset score = (99 + 96 + 92 + 88) / 4 = 93.75%

This dataset score of 93.75% represents the overall quality of the customer dataset across all fields.

Tip: All fields are weighted equally in the dataset score calculation. Fields with lower scores will proportionally impact the overall dataset quality assessment.

Datasource score calculation

The datasource score represents the overall quality of all datasets and schemas within a datasource. It is calculated by aggregating the scores of all datasets contained in that datasource.

Datasource score formula:

Datasource score = Average of all dataset scores in the datasource

Example:

Consider a Snowflake datasource containing three datasets:

  • Customer dataset score = 93.75%
  • Orders dataset score = 91.5%
  • Products dataset score = 95.2%

The datasource score is calculated as:

Datasource score = (93.75 + 91.5 + 95.2) / 3 = 93.48%

This datasource score of 93.48% represents the overall quality of all data within the Snowflake datasource.

Complex scoring scenarios

In real-world implementations, quality scoring often involves complex scenarios with multiple rules, fields, and dimensions. This section provides guidance on how scores are calculated in these situations.

Scenario 1: Multiple rules and multiple dimensions on a single field

When a field has multiple rules assigned to different dimensions:

  1. Each rule generates its own score based on rule evaluation.
  2. Rules are grouped by their assigned dimension.
  3. Dimension scores are calculated as the average of rules within that dimension.
  4. The field score is calculated as the average of all dimension scores for that field.

Example:

Customer Email field with rules across two dimensions:

  • Completeness dimension:
    • Rule: Email not null = 98%
    • Completeness Score = 98%
  • Validity dimension:
    • Rule: Valid email format = 96%
    • Rule: Email domain exists = 94%
    • Validity Score = (96 + 94) / 2 = 95%

Field Score = (98 + 95) / 2 = 96.5%

Scenario 2: Dataset with mixed field coverage

When not all fields in a dataset have rules assigned:

  1. Only fields with assigned rules contribute to the dataset score calculation.
  2. Fields without rules are not included in the average.
  3. The dataset score reflects the quality of evaluated fields only.

Example:

Customer dataset with 10 fields, but only 6 have quality rules:

  • Field 1 score = 99%
  • Field 2 score = 96%
  • Field 3 score = 92%
  • Field 4 score = 88%
  • Field 5 score = 94%
  • Field 6 score = 91%
  • Fields 7-10 = No rules assigned (not included in calculation)

Dataset Score = (99 + 96 + 92 + 88 + 94 + 91) / 6 = 93.33%

Important: To ensure comprehensive quality assessment, consider assigning rules to all critical fields in your datasets. Fields without rules will not contribute to the overall quality score.

Score updates and refresh timing

Quality scores are updated when rules are run. The timing of score updates depends on how rules are configured:

  • Manual runs: Scores update immediately when you manually run a rule from the Rules page or rule details page.
  • Scheduled runs: Scores update according to the schedule configured for the rule. By default, rules follow the discovery schedule.
  • Cascading updates: When a rule score updates, all dependent scores (field, dimension, dataset, and datasource) are automatically recalculated.

The timestamp displayed on the Scoring page indicates when the most recent rule evaluation occurred for that asset.

Interpreting quality scores

Quality scores provide a quantitative measure of data quality, but interpretation depends on your organization's data quality standards and requirements.

General Guidelines:

  • 90-100%: Excellent quality. Data is suitable for critical business operations and analytics.
  • 80-89%: Good quality. Data is generally suitable for most operations but may require attention in specific areas.
  • 70-79%: Fair quality. Data has notable quality issues that should be addressed before use in critical operations.
  • Below 70%: Poor quality. Data requires significant remediation before use in production systems.
Tip: These guidelines are recommendations only. Your organization should establish its own quality thresholds based on business requirements and risk tolerance.

Best practices for quality scoring

To maximize the effectiveness of quality scoring in your organization, follow these best practices:

  • Comprehensive rule Coverage: Assign quality rules to all critical fields and datasets to ensure complete quality assessment.
  • Regular rule runs: Schedule rules to run regularly to keep quality scores current and relevant.
  • Threshold monitoring: Establish quality score thresholds for your organization and monitor assets that fall below acceptable levels.
  • Trend analysis: Track quality score changes over time to identify improving or deteriorating data quality trends.