You can track a set of statistics for Numeric and Textual data such as Cardinality (number of distinct values), min/max values, min/max length, number of empty/blank values, new values found and more. Data drift configuration generates alerts for unexpected changes in the selected statistics. Alerts will be generated if the value of the selected statistics increases or decreases more than its expected value.
Configure a confidence-based alert
- The alerts will not be generated for lower values if the percentage overlap, the alerts will be generated only for values beyond the higher range.
- It will set to a default range if your range overlaps at 100%
For example, alerts will not be generated below 80% for the range selected as 80% to 80%. Alerts will be generated only above 80%. If you set the range as 100% to 100%, the range will revert to default range as 60% to 80%.
Use these steps as you create an Observer to configure a data drift rule with a confidence-based alert. You can also edit existing Observer rules.
- After you select the data assets to observe on the Create Observer page, click Next.
- In the Data Drift row, click the gear icon in the Configure column.
- Select the statistics for the data types you want to observe. All Data Types:
- Completeness
- Unique value count
Numeric Data:- Minimum
- Maximum
- Mean
- Standard deviation
Textual Data:- Minimum length
- Maximum length
Detail Drift:- Distribution of value count
- Select the confidence level for generating an
alert:
- Drag the lower value slider and higher value slider to set a range of percentages, or
- Enter the lower and higher value in the text box to set the range of percentages.
- Use significant alert tuning to fine tune your alerts and reduce noise.
- Click Save.
Define range to generate alerts
- The value is provided in percentages, except for Textual Data, which is provided in numbers.
- For the Distribution of value count, the provided value is the percentage difference. It is not calculated as the percentage of a percentage. For example, if the minimum change is set as 3% for the distribution of value count, and the last historical value is 12%, then the alert is generated only when the current value is greater than or equal to 15% (3% plus 12%) or less than or equal to 9% (12% minus 3%).
- If the historical values are perfectly linear, then the expected value is used to calculate the significance threshold. For example, if the values were 10, 20, 30, 40, and 50, then the expected value is 60. If the significance threshold is set as 10%, then if the new value lies between 54 (60 minus 10% of 60) and 66 (60 plus 10% of 60), then no alerts will be raised.
- If all historical values are zero, then significance thresholds are not applied.
- If all historical values are 100% for completeness, then significance thresholds are not applied.
- If values are historically increasing, and then the current one decreases, then significance thresholds are not applied. Similarly, this is true if the historical values were decreasing, and then the current one increases. This is not applicable to the distribution of value count.
To set significance tuning of alert:
- While configuring confidence-based alerts for data drift, type the value in the text box for the corresponding statistic.
- Uncheck the statristics that are not required.
- Click Save.