Profiling scans your data records from all the data sources - irrespective of their volume and complexity. It identifies problems related to correctness, completeness, and validity in the data, and suggests actions to fix the issues. Thus, it improves the quality and utility of your data with very little manual effort.
- It is the first step in analyzing your data and predicting how much effort is needed to make it usable.
- It improves your trust in the data sets you have.
- It is one of the mandatory steps for taking control of your organizational data and using it across the enterprise.
Profiling is cleansing and monitoring your data connection. You can create a data profile after you set up a successful connection.
You can create a Data Profile from the Profilers page. You can also run, edit, delete, or create a copy of the existing data profile. On the profilers page, you can see the list of existing profiles created in Data Integrity Suite.
The profilers page
The profilers page provides information about the Data Profiles. On this page, you can create a new data profile or run, edit, duplicate, or delete any existing data profile.This page is displayed when you click tab on the main navigation menu.
- Search: Type a complete or partial name of the data profile, which shows results for the matching text string.
- Data Profile: Shows the name of a data profile. Click the context menu button to Run, Edit, Duplicate or Delete a data profile. You can also Stop a running profile.
-
Scheduler: Shows the status of the profile schedule.
- Enabled: The profile is scheduled. Click to view schedule and profile run history. To disable the schedule, click the Enabled status, and then click toggle.
- Disabled: The profile is not scheduled. Click to view schedule and profile run history. To enable the schedule, click the Disabled status, and then click toggle.
- Suspended: The scheduler is stopped due to ten consecutive failures. To investigate the issue or enable the schedule, click the Suspended status.
- Not Configured: The scheduler is not created and configured for the profile. To create a schedule, click the Not Configured status, and then click the Edit Scheduler button.
- Unavailable: The scheduler service is unavailable and cannot provide scheduled profile run history.
- Completeness: Shows the completeness of the profile that are run successfully.
- Tables: Shows the number of tables profiled.
- Records: Shows the number of rows processed for the data profile.
- Last Run: Shows the date and time when the profile was run recently.
- Duration (hh:mm:ss): Shows the time taken for the profile to run successfully.
- Created By: Shows the user's name who created the data profile.
- Create Profile: Allows you to create a new data profile.
Steps to perform end-to-end profiling:
- Identify the data source you want to profile.
- Set up and catalog a data connection for the data source.
-
Create a Profile.
- Choose the cataloged data assets to profile
- Define Profiler rules
- Identify the types of rules to apply
- Select parameters for each rule
- Choose a schedule for running the Profile
-
Run the
Profile (manual or automatic).
- View profiling details
- Analyze data from statistics
Profiling rules overview
Profiling rules perform different types of analysis on your data. When setting up a profile, choose the profiling rules that perform the types of data analysis you are interested in.
Default Analysis rule
Default Analysis determines statistics such as completeness, uniqueness, numerical analysis, semantic type detection, confidence, percentile, histogram, frequency analysis and string length in the dataset.