Data drift statistics - Precisely Data Integrity Suite

Data Integrity Suite

Product
Spatial_Analytics
Data_Integration
Data_Enrichment
Data_Governance
Precisely_Data_Integrity_Suite
geo_addressing_1
Data_Observability
Data_Quality
dis_core_foundation
Services
Spatial Analytics
Data Integration
Data Enrichment
Data Governance
Geo Addressing
Data Observability
Data Quality
Core Foundation
ft:title
Data Integrity Suite
ft:locale
en-US
PublicationType
pt_product_guide
copyrightfirst
2000
copyrightlast
2026

The Data Drift alert is generated based on the statistics configured during Observer creation. The required statistics can be enabled or diasbled depending on the requirements.

All Data Type

This parameter evaluates the data based on the Completeness and Unique value count parameters for all data types.

Completeness

Completeness refers to the degree to which a table or column contains all the expected or required values for a given dataset. It is represented in percentage.For example, if a dataset is expected to have 1000 rows of data, a complete table would have all 1000 rows present, with no missing or null values. Similarly, if a column is expected to have a particular type of data, such as numerical values, then a complete column must have all values of the expected type.

Unique value count

Unique Value Count is referred to as Cardinality. Cardinality is a measure of the "unique" number of elements of the set. It provides a unique count of elements in the set.

For example, consider the below set of data
Name Age
Mark 35
John 29
Ashley 39
Jonas 33
Mark 35
John 25
Mark 20
James 33
Ashley 25
Emma 20

The unique values for "Name" column are: Mark, John, Ashley, Jonas, James, Emma

The cardinality of "Name" is 6

The unique values for "Age" column are: 35, 29, 39, 33, 35, 25, 20

The cardinality of "Name" is 7

Numeric Data

The numeric data statistic can be configured to generate alerts when there is a change in the numerical parameters.

Minimum
Minimum is referred to as the Minimum Value in numeric data. It is the lowest value in the dataset.

For example, the set SALARY={1215, 2000, 5263, 1126, 3687} contains the lowest value "1126". The minimum value of the set SALARY is 1126.

Maximum
Maximum is referred to as the Maximum Value in numeric data. It is the highest value in the dataset.

For example, the set SALARY={1215, 2000, 5263, 1126, 3687} contains the highest value "5263". The maximum value of the set SALARY is 5263.

Mean

Mean is the average of the elements present in the numeric data.

Mean = The sum of all the elements in a set/the number of elements.

For example, the set SALARY={1215, 2000, 5263, 1126, 3687} contains 5 elements. The mean can be calculated as [1215+2000+5263+1126+3687]/5 which equals 2658.2

Therefore, the mean of the set SALARY is 2658.2

Standard deviation
Standard Deviation (SD) is a measure that provides a variation in data points from its mean value. The standard deviation is calculated as the square root of variance by determining each data point's deviation relative to the mean.

A higher SD means the data points are highly spread out from the mean, whereas, a lower SD means that the data points are close to the mean.

Textual Data

This can be configured to generate an alert whenever there is a change in the in the textual parameters such as minimum length or maximum length of the textual data.

Minimum length

Minimum length is the smallest length of the text in the textual dataset. For example, the set CITY={Sao Paulo, Mexico, Tokyo, Shanghai, Cairo, Mumbai} contains the lowest value for "Cairo" and "Tokyo". The minimum length of the set CITY is 5.

Maximum length

Maximum Length is the largest length of the text in the textual dataset. For example, the set CITY={Sao Paulo, Mexico, Tokyo, Shanghai, Cairo, Mumbai} contains the largest value for "Sao Paulo". The maximum length of the set CITY is 9.

Detail drift

Detail Drift rule can be set to measure the count of an element within the set of elements in the selected datasets.

Distribution of value count
The distribution of value count is referred to as Cardinality Detail. Cardinality detail is a measure of the "count of occurrence of an element" in the set of elements. The cardinality detail can be calculated in percentage as,
(Number of rows an element has occurred/Total number of elements)*100

On the Alerts page, in the Summary tab, you can choose to view the distribution in percentage or in number of occurrences. The Show by percentage checkbox is selected by default and shows the distribution in percentage.

For example, consider the below set of data.

Name Age
Mark 35
John 29
Ashley 39
Jonas 33
Mark 35
John 25
Mark 20
James 33
Ashley 25
Emma 20

The cardinality detail of "Mark" = (3/10)*100 which equals 30 percent

The cardinality detail of "25" = (2/10)*100 which equals 20 percent