The Data Drift alert is generated based on the statistics configured during Observer creation. The required statistics can be enabled or diasbled depending on the requirements.
This parameter evaluates the data based on the Completeness and Unique value count parameters for all data types.
Completeness refers to the degree to which a table or column contains all the expected or required values for a given dataset. It is represented in percentage.For example, if a dataset is expected to have 1000 rows of data, a complete table would have all 1000 rows present, with no missing or null values. Similarly, if a column is expected to have a particular type of data, such as numerical values, then a complete column must have all values of the expected type.
Unique Value Count is referred to as Cardinality. Cardinality is a measure of the "unique" number of elements of the set. It provides a unique count of elements in the set.
| Name | Age |
|---|---|
| Mark | 35 |
| John | 29 |
| Ashley | 39 |
| Jonas | 33 |
| Mark | 35 |
| John | 25 |
| Mark | 20 |
| James | 33 |
| Ashley | 25 |
| Emma | 20 |
The unique values for "Name" column are: Mark, John, Ashley, Jonas, James, Emma
The cardinality of "Name" is 6
The unique values for "Age" column are: 35, 29, 39, 33, 35, 25, 20
The cardinality of "Name" is 7
Numeric Data
The numeric data statistic can be configured to generate alerts when there is a change in the numerical parameters.
For example, the set SALARY={1215, 2000, 5263, 1126, 3687} contains the lowest value "1126". The minimum value of the set SALARY is 1126.
For example, the set SALARY={1215, 2000, 5263, 1126, 3687} contains the highest value "5263". The maximum value of the set SALARY is 5263.
Mean is the average of the elements present in the numeric data.
Mean = The sum of all the elements in a set/the number of elements.
For example, the set SALARY={1215, 2000, 5263, 1126, 3687} contains 5 elements. The mean can be calculated as [1215+2000+5263+1126+3687]/5 which equals 2658.2
Therefore, the mean of the set SALARY is 2658.2
Standard deviationA higher SD means the data points are highly spread out from the mean, whereas, a lower SD means that the data points are close to the mean.
Textual Data
This can be configured to generate an alert whenever there is a change in the in the textual parameters such as minimum length or maximum length of the textual data.
Minimum length is the smallest length of the text in the textual dataset. For example, the set CITY={Sao Paulo, Mexico, Tokyo, Shanghai, Cairo, Mumbai} contains the lowest value for "Cairo" and "Tokyo". The minimum length of the set CITY is 5.
Maximum Length is the largest length of the text in the textual dataset. For example, the set CITY={Sao Paulo, Mexico, Tokyo, Shanghai, Cairo, Mumbai} contains the largest value for "Sao Paulo". The maximum length of the set CITY is 9.
Detail drift
Detail Drift rule can be set to measure the count of an element within the set of elements in the selected datasets.
On the Alerts page, in the Summary tab, you can choose to view the distribution in percentage or in number of occurrences. The Show by percentage checkbox is selected by default and shows the distribution in percentage.
For example, consider the below set of data.
| Name | Age |
|---|---|
| Mark | 35 |
| John | 29 |
| Ashley | 39 |
| Jonas | 33 |
| Mark | 35 |
| John | 25 |
| Mark | 20 |
| James | 33 |
| Ashley | 25 |
| Emma | 20 |
The cardinality detail of "Mark" = (3/10)*100 which equals 30 percent
The cardinality detail of "25" = (2/10)*100 which equals 20 percent