A data quality rule defines the conditions or criteria that data must meet to be considered valid, accurate, and reliable.
- Default rules: Default rules are automatically created in the workspace and are triggered on all assets and fields.
- Custom rules: Custom rules are created by a user in the workspace by configuring certain conditions.
The default rules are triggered on all the assets for the respective datasource when Run Quality Rules and Calculate Scores toggle is turned on within the Insights section while establishing a new datasource. This evaluates the assets on various aspects of data such as distribution, completeness, validity and generates a Quality Score that displays the quality of data. The purpose of rules is to assess the completeness, consistency, accuracy, and timeliness of data. By applying predefined rules to datasets, organizations can identify and address data issues, errors, or inconsistencies in a systematic manner.
Profiling in the Data Integrity Suite includes categories that provide insights into data characteristics. Each category displays relevant metrics, including counts and percentages of valid, invalid, and null entries, along with visual representations such as bar charts for data distribution and value counts. Profiling statistics are generated for datasets and fields when the Profile Datasets toggle is turned on within the Insights section while establishing a new datasource.
To learn more about Insights, refer to View insights and schedule.
Rules tab
It lists all the rules in the Data Integrity Suite and displays the following information:
-
Rule: Displays the name of the rule. Select
to view detailed information about the specific rule which
includes:
- Rule Scores: Displays the list of associated fields, their respective rule score and the date when the rule was last executed. Rule score is the quantitative measure of the quality based on the rule criteria.
- Rule Definition: Provides rule's description, dimension, scoring bands, total number of associated target fields, pass conditions, result type and the associated expression.
- Dimension: Shows the rule dimension. Each dimension represents a particular quality criterion that the data must adhere to, and rules are formulated to evaluate and ensure the desired level of quality in that dimension.
- Targets: Shows the number of targeted fields.
- Run Status: Displays the status of rule execution.
- Last Evaluated: Displays the timestamp when the rule was last evaluated.
- Scheduling: Displays if the scheduling for the rule is enabled or disabled.
- Description: Shows a detailed description of the rule.