Profiling details can be accessed at the dataset level, allowing you to review both dataset profile metrics and column-level metrics directly from the dataset tab in the catalog. This functionality offers a comprehensive overview of the dataset, making it ideal for those seeking to thoroughly understand its structure, usage, and quality.
To access a detailed view of the profiling metrics for the associated dataset:
- From the main navigation menu, click Catalog and navigate to the Datasets tab.
- Click on the required dataset name to open the details page.
- Select the Profile tab to view the profiling details. The profiling tab displays the following details:
- Table Summary: This section displays the following details
- Effective date: It displays the latest timestamp of the profile run.
- Total row count: Total number of rows considered for profiling.
- Sample row count: Number of rows present in the generated sample
- You can view the previous profile runs by clicking the arrows and at any point, you can switch back to view the details of the Latest profile run.
- You can click the refresh icon and choose to re-run basic or advanced profiling for the selected dataset. Tip: You must have the Operate Rules and Scores permission to update the profiling.
- You may also click the scheduling icon to view the profiling details for the selected time period. Please note that the available periods begin from the date on which the field was initially profiled.
- Asset details: The asset details table shows information about fields connected to profiling. The specific details you see at the field level for datasets depend on whether basic or advanced profiling is enabled for the datasource. Some of the values listed below are only available if advanced profiling is turned on.
- Field: Name of the field associated with the profiling details
- Data type: The data type in the field.
- Last Profiling: Shows the most recent date the field was profiled.
- Popularity: Shows the number of times the field is used in the database queries. This is the count of both manual queries run on your database instance and queries run during Profile/Observer runs.
- Null count: Displays the number of null values in every row of the table.
- Blank count: Displays the number of blank values in every row of the table.
- Distinct values: Display the count of all the records present in your data source irrespective of those being unique or non-unique records.
- Semantic type:* Limited Availability Displays the detected semantic type. It will not be displayed if the semantic type is not detected in the data. Click the refresh icon next to the corresponding semantic type to re-run the semantic type detection and identify the semantic type. If you run semantic type detection again, you'll see the timestamp of the latest execution by hovering over the semantic type. A semantic type job will also appear in the assessments tab under Quality jobs to show its progress.
- Minimum value: Represents the lowest value for numerical data or the earliest date in a dataset.
- Maximum value: Refers to the greatest numerical value or the latest date present within a given dataset.
- Minimum length: Minimum length is the smallest length of the text in the textual dataset. For example, the set CITY={Sao Paulo, Mexico, Tokyo, Shanghai, Cairo, Mumbai} contains the lowest value for "Cairo" and "Tokyo". The minimum length of the set CITY is 5.
- Maximum length: Maximum Length is the largest length of the text in the textual dataset. For example, the set CITY={Sao Paulo, Mexico, Tokyo, Shanghai, Cairo, Mumbai} contains the largest value for "Sao Paulo". The maximum length of the set CITY is 9.
The search bar provides the capability to locate specific parameters displayed in the table.
- Table Summary: This section displays the following details
-
Clicking on a field name displays the profiling details associated with the field in a side panel.
Tip: Click the diamond icon next to the associated
metrics in the profile tab to view the historical values over a certain period of
time.
Limited Availability: This feature is currently available only in select workspaces and might be subject to change before general availability.
To access the profiling details in the dataset side panel view:
- From the main navigation menu, select Catalog and navigate to the Datasets tab.
- Click on the required dataset card to open the details in the side panel.
- Select the Profile tab to view the profiling details. The profile tab in the dataset side panel view consists of the following details:
- Summary: This section displays the following details:
- Effective date: It displays the latest timestamp of the profile run.
- Total row count: Total number of rows considered for profiling.
- Sample row count: Number of rows present in the generated sample.
- Table Summary: This section displays the following details
- Field: Name of the field associated with the profiling details
- Quality: The quality bar provides a consolidated view of the Valid, Invalid and Null counts in the profiled fields.
- Summary: This section displays the following details:
- To update the profiling results, click the refresh icon and select either basic or advanced profiling to re-run the analysis.
- To view profiling details for a specific time period, click the scheduling icon and choose your desired time frame. Note that the available periods begin from the initial profiling date of the field.
- If the selected field has not yet been profiled, you can initiate profiling by selecting either Run Basic Profile or Run Advanced Profile according to your requirement.
- Clicking on the field name will navigate you to the page where you can view the field level profiling details.