Profiling details are available at the field level, enabling you to examine metrics specific to each column. This feature provides a detailed overview of each field and its profiling attributes, providing an in-depth understanding of data structure, usage, and quality.
To access a detailed view of the profiling metrics for the associated field:
- From the main navigation menu, click Catalog and navigate to the fields tab.
- Click on the field name to open the details page.
- Navigate to the profile tab to view the profiling details. The profile tab
displays the following details:
- Summary: This section displays the following details:
- Effective date: Date of the latest set of profiling information received for the asset.
- Total row count: Total number of rows in an entire data set.
- Sample row count: Number of rows that are profiled and a percentage of the total count.
- Base type: Type of data.
- Semantic type:* Limited Availability Displays the detected semantic type. It will not be displayed if the semantic type is not detected in the data. Click the refresh icon next to the corresponding semantic type to re-run the semantic type detection and identify the semantic type. If you run semantic type detection again, you'll see the timestamp of the latest execution by hovering over the semantic type. A semantic type job will also appear in the assessments tab under Quality jobs to show its progress.
- Quality: The quality bar indicates the overall quality and
reliability of the profiled data. It has the following details displayed
- Valid: Displays the numeric count and percentage of values that
are valid.
- Distinct: Indicates the number of unique valid values
Note: During the onboarding flow, a maximum of 12,000 distinct values is captured for profiling. This approach helps streamline data analysis, allowing you to focus on the most relevant and meaningful data points without being overwhelmed by excessive details. - Invalid/Outliers: Displays the sum of the total number of invalid values and outliers.
- Null/Blank: Displays the sum of the total number of null and blank values
- Valid: Displays the numeric count and percentage of values that
are valid.
- Statistics: Several statistics are provided via the profiling
details. The availability of specific statistical values partly depends
on the data type. Click the diamond icon next to the associated metrics
to view the historical values over a certain period.
- Null count: Displays the number of null values in every row of the table.
- Blank count: Displays the number of blank values in every row of the table.
- Minimum value: Minimum is referred to as the Minimum Value in numeric data. It is the lowest value in the dataset.
- Maximum value: Maximum is referred to as the Maximum Value in numeric data. It is the highest value in the dataset.
- Min length: Minimum length of the string in the column.
- Max length: Maximum length of the string in the column.
- Mean: Mean is the average of the elements present in the numeric data.
- Standard deviation: Standard Deviation (SD) is a measure that provides a variation in data points from its mean value. The standard deviation is calculated as the square root of variance by determining each data point's deviation relative to the mean.
- Leading whitespace: Shows the number of values having leading white space characters in the selected sample data column.
- Trailing whitespace: Shows the number of values having leading white space characters in the selected sample data column.
- Validation regular expression: Displays the validation to check whether the sample values are valid or not.
- Histogram: The frequency bar chart displays the frequency of occurrence of each value in the sample data. Hover over a bar in the chart to view the frequency and value represented by the bar.
- Frequency analysis: The frequency analysis chart displays the distribution of frequency of data values for any type of column. It shows the repetitions of the data value.
- Percentile: It is a value where an observation falls in a range of other observations. This chart is presented exclusively for fields that contain numerical data.For example, if a score falls in the 30th percentile, this means that 30 percent of all the scores recorded are lower. For a Percentile chart, the X-axis represents the actual data, and the Y-axis divides the data into 100 parts, representing percentiles from 0 to 100. This remains the same for every percentile chart. When you hover over a point, it displays the exact percentile for that data. For example, if you had a dataset of the publication years of books, and 2008 is at the 75th percentile, it means that 75% of the books were published before 2008, and only 25% were published in or after 2008.
- Top values, bottom values, invalid/outliers and shapes: These
categories function similarly, displaying only when data is available.
Each category is represented as a bar chart that shows both the value
and its count. Adjacent to the count, a percentage is shown. This
percentage is calculated by dividing the value count by the total number
of samples.
- Top values: These are derived from topK and include the count of each in cardinalityDetail.
- Bottom values: These originate from bottomK and also include the count of each in cardinalityDetail.
- Invalid/Outlier values: Both the values and their counts are detailed in outlierDetail.
- Shapes: Both the values and their counts are detailed in shapesDetail.
- Summary: This section displays the following details:
Note: Top and bottom sample sets preserve the order in which they
are received. However, numerical values are an exception; they are sorted by their
key value. This means top values are sorted in descending order, while bottom values
are sorted in ascending order.
Tip: Click the diamond icon next to the associated
metrics to view the historical values over a certain period of time.
To access the profiling details for the field in the side panel view:
- From the main menu, select Catalog and navigate to the Fields tab.
- Click on the card associated with the field to open its details in a side panel.
- Navigate to the Profile tab.
- The Profile tab in the side panel view displays the following details:
- Summary
- Quality
- Statistics
- Click open in new tab to view the profile statistics in detail.
Tip: You can use this view to get a high-level
overview of the profiling details associated with the field.
Limited Availability: This feature
is currently available only in select workspaces and might be subject to change
before general availability.