Profiling categories

Data Integrity Suite

Product
Spatial_Analytics
Data_Integration
Data_Enrichment
Data_Governance
Precisely_Data_Integrity_Suite
geo_addressing_1
Data_Observability
Data_Quality
dis_core_foundation
Services
Spatial Analytics
Data Integration
Data Enrichment
Data Governance
Geo Addressing
Data Observability
Data Quality
Core Foundation
ft:title
Data Integrity Suite
ft:locale
en-US
PublicationType
pt_product_guide
copyrightfirst
2000
copyrightlast
2026

Profiling in the Data Integrity Suite includes categories that provide insights into data characteristics. Each category displays relevant metrics, including counts and percentages of valid, invalid, and null entries, along with visual representations such as bar charts for data distribution and value counts.

The sample categories are always the first categories that display and should always have data. However, there is no validation for any required fields in the API, other than profileSetDate.

Sample summary

Field Description Source
Effective Date Date of the latest set of profiling information received for the asset. profileSetDate
Total Row Count Total number of rows in an entire data set. totalCount
Sample Row Count

Number of rows that are profiled and a percentage of the total count.

sampleCount
Base Type Type of data. type
Type Confidence Displays the confidence level as a percentage that the profiling results accurately reflect the specified data type. The confidence level is presented with two decimal points of precision. For instance, a value of .9753 from the API is shown as 97.53% in the user interface, indicating the certainty of the data type, such as a date field confidence

Sample quality

Next to each calculated percentage, there is a tool tip that displays the percentage relative to a total. This total can vary, such as the total of the sample, the total of valid entries, and others.

Field Description Source
Quality bar A single horizontal bar displays counts of valid, invalid, and not populated rows from the sample data.  
Valid

Counts and percentages of valid values in the sample, based on Type or Semantic Type. Percentage calculated as Valid Count divided by Sample Count.

  • Distinct: Number of unique valid values.
    Note: During the onboarding flow, a maximum of 12,000 distinct values is captured for profiling. This approach helps streamline data analysis, allowing you to focus on the most relevant and meaningful data points without being overwhelmed by excessive details.
matchCount

cardinality

Invalid/Outliers Counts and percentages of invalid or outlier values in the sample. Percentage calculated as Invalid/Outliers Count divided by Sample Count. outlierCount
Null/Blank

Counts and percentages of null or blank entries in the sample. Percentage calculated as Not Populated Count divided by Sample Count.

nullCount + blankCount

Sample distribution

The bar chart illustrates the distribution of samples based on their data type. Here's how it represents different types of data:

  • Date/Time: The chart displays the distribution across various time points.
  • String: It shows the distribution according to different string values.
  • Number: The chart presents the range distribution and includes the standard deviation and mean of the values.
  • Boolean: It indicates whether the values are true or false.

The chart uses green bars to represent valid data points. It also marks invalid data or outliers with red bars and represents null or blank values with gray bars.

Top values, bottom values, invalid/outliers and shapes

These categories all function similarly, displaying only when data is available. Each category is represented as a bar chart that shows both the value and its count. Adjacent to the count, a percentage is shown. This percentage is calculated by dividing the value count by the total number of samples.

  • Top values: These are derived from topK and include the count of each in cardinalityDetail.
  • Bottom values: These originate from bottomK and also include the count of each in cardinalityDetail.
  • Invalid/Outlier values: Both the values and their counts are detailed in outlierDetail.
  • Shapes: Both the values and their counts are detailed in shapesDetail.
Note: Top and bottom sample sets preserve the order in which they are received. However, numerical values are an exception; they are sorted by their key value. This means top values are sorted in descending order, while bottom values are sorted in ascending order.

Statistics

Several statistics are provided via the APIs. The availability of specific statistical values partly depends on the data type. For instance, if the data type is boolean, then only Blank Count, Null Count, and Validation Regular Expression will be displayed, provided they have values.

Label Source
Null Count nullCount
Blank Count blankCount
Minimum Value min
Maximum Value max
Minimum Length minLength
Maximum Length maxLength
Mean mean
Standard Deviation standardDeviation
Multiline multiline
Leading Whitespace leadingWhiteSpace
Trailing Whitespace trailingWhiteSpace
Leading Zero Count leadingZeroCount
Validation Regular Expression regExp