Basic profiling captures essential details about your data when you establish a new datasource. It provides a foundational understanding of your dataset through key metrics and statistics.
When basic profiling is enabled, the profiling details page displays the following sections:
- Summary: The summary section displays the following details:
- Effective date: The date of the latest set of profiling information received for the asset.
- Sample row count: The total number of rows in the dataset that were profiled.
- Base type: The type of data in the column.
- Quality section: The quality bar indicates the overall quality and reliability of the profiled data. It displays the following metrics:
- Valid: Displays the numeric count and percentage of values that are valid.
- Distinct: Indicates the number of unique valid values. During the onboarding flow, a maximum of 12,000 distinct values is captured for profiling. This approach helps streamline data analysis, allowing you to focus on the most relevant and meaningful data points without being overwhelmed by excessive details.
- Invalid or Outliers: Displays the sum of the total number of invalid values and outliers.
- Null or Blank: Displays the sum of the total number of null and blank values.
-
Statistics section: Several statistics are provided through the profiling details. The availability of specific statistical values depends partly on the data type. Click the diamond icon next to the associated metrics to view the historical values over a certain period.
- Minimum length: The minimum length of the string in the column.
- Maximum length: The maximum length of the string in the column.
- Mean: The average of the elements present in the numeric data.
- Standard deviation: A measure that provides a variation in data points from its mean value. The standard deviation is calculated as the square root of variance by determining each data point's deviation relative to the mean.