Glossary - Precisely Data Integrity Suite

Data Integrity Suite glossary

Product
Spatial_Analytics
Replicate
Data_Integration
Data_Enrichment
Data_Governance
Precisely_Data_Integrity_Suite
geo_addressing_1
Data_Observability
Data_Quality
dis_core_foundation
Services
Spatial Analytics
Data Integration
Data Enrichment
Data Governance
Geo Addressing
Data Observability
Data Quality
Core Foundation
ft:title
Data Integrity Suite glossary
ft:locale
en-US
PublicationType
pt_reference
copyrightfirst
2025
copyrightlast
2025

Agent

A critical component of the Data Integrity Suite that enables secure communication between on-premises infrastructure and the Precisely Cloud. It facilitates data processing within the user's own environment and must be installed and configured on the user's infrastructure. Agents work alongside engines to support data operations and require regular maintenance, including periodic updates that must be downloaded and installed by users to ensure continued compatibility and performance.

Alert

A notification generated by an Observer when it detects unexpected changes, anomalies, or conditions in data assets that exceed thresholds defined in Observers. Alerts are triggered based on configured rules such as freshness, volume, data drift, or schema drift to promptly communicate data issues.

Alert level

A classification system that indicates the severity of an alert, typically displayed as Warning (yellow) or Critical (red), along with a confidence percentage. It helps users prioritize and respond appropriately to data anomalies based on their impact level.

API key

A required credential used to access Data Integrity Suite APIs that can be combined with an API Secret for authentication via Basic Authentication or to generate OAuth 2.0 access tokens. It serves as an identifier for authenticating and authorizing access to various API services including Address Autocomplete, Email Verification, and Data Graph.

API secret

A confidential credential used alongside an API key to authenticate and authorize access to Data Integrity Suite APIs. It must be stored securely and cannot be retrieved once you navigate away from the generation page.

Asset

A structured data object that represents a specific instance of an asset type, containing detailed rules, configurations, or information about organizational resources such as policies, technical components, or infrastructure elements. Assets are created from predefined asset type templates and serve as the actual operational entities within the Data Integrity Suite.

Broker

A broker is a Kafka server component used in Data Integration pipelines when Kafka is the target. It handles message storage and delivery, acting as a middle layer between producers (sending data) and consumers (receiving data). Identified by a hostname or IP and port, the broker ensures reliable message flow and is essential for replication pipelines that write to Kafka.

Cataloging

The process of identifying, organizing, and recording metadata about data sources, schemas, and their structures within a data catalog system. It involves automatically discovering and documenting database schemas, tables, fields, and their relationships to make data assets searchable and manageable.

Connections

Interfaces between the Data Integrity Suite and external datasources that enable cataloging and accessing associated databases or data warehouses. These connections serve as the foundation for organizing and documenting data assets in a centralized repository for improved data management and accessibility.

Completeness

A metric that represents the percentage of complete and incomplete rows detected in profiled data. It is used to assess quality of data and sort data profiles based on how much of the data contains all required values versus missing or null values.

Consolidation rule

A set of conditions and actions defined in the Consolidate Matches step of a Data Quality pipeline that determines how to select and merge the most accurate or relevant data from duplicate or similar records to create a golden record; while the consolidation occurs within the Data Quality pipeline, the resulting output is written as a dataset that can be accessed in the catalog.

Continuous replication

A data integration process that facilitates real-time or near-real-time synchronization of data between source and target systems through automated pipelines. It automatically replicates updates, deletions, and insertions while maintaining data integrity and availability throughout the replication process.

Continuous replication pipeline

A data integration process that facilitates real-time or near-real-time synchronization of data between source and target systems through automated pipelines. It automatically replicates updates, deletions, and insertions while maintaining data integrity and availability throughout the replication process.

Data Catalog

A service that organizes and manages data assets through automated metadata collection, providing users with searchable information about data sources, datasets, and fields without storing the actual data. It enables users to understand data lineage, definitions, and relationships while maintaining data governance and supporting informed decision-making.

Data Enrichment

A service that helps you enhance the value and usability of your data by adding information from many expertly curated datasets for locations in the United States of America. It adds attributes such as risk scores, demographics, or property details to improve data context and usability. It also enhances address-level data by joining it with licensed, domain-specific datasets using a unique PreciselyID.

Data Governance

A service allows you to define, track, and manage your data assets with Data Integrity Suite. It provides a high level of flexibility, set of repeatable, scalable strategies and technologies that ensure important data assets stay in compliance with organizational policies and government regulations. It has a comprehensive framework including policies, processes, responsibilities, and tools for managing data access, security, quality, and consistency across an organization.

Data Integration

A service that enables the management of data pipelines by connecting, designing, deploying, and monitoring data transfers between different systems and platforms. It allows organizations to move and synchronize data across various applications and environments without requiring extensive technical expertise.

Data Integrity Suite

A comprehensive set of interoperable cloud services designed to ensure business data maintains the highest levels of accuracy, consistency, and context through integrated data management capabilities. The suite combines advanced software capabilities with enrichment data to provide a modular SaaS structure for managing, improving, and extracting value from data throughout its lifecycle.

Data Observability

A service within the Data Integrity Suite that enhances trust in data and analytics by leveraging machine learning to detect anomalies and outliers, ensuring data reliability through comprehensive monitoring, profiling, and automated alerting. It provides continuous visibility into data environments with real-time monitoring capabilities across platforms like Databricks, Snowflake, and Amazon Redshift.

Data profile

A configuration within the Data Observability service that analyzes data quality across sources by scanning for completeness, validity, and correctness issues. It generates statistics and scores for selected data assets to help identify and address data quality problems.

Data Quality

A service within the Data Integrity Suite that validates, geocodes, and enriches critical data assets to ensure their accuracy, completeness, and consistency for business operations and analytics. It provides tools to identify, analyze, and rectify data issues through processes like standardization, deduplication, and validation against defined rules.

Data Quality pipelines

Automated workflows that ingest data from datasets and perform a series of transformation steps such as standardization, de-duplication, cleaning, validation, and reformatting to ensure data accuracy, consistency, and integrity. These pipelines resolve common data issues like duplicates, missing values, and inconsistent formats before outputting clean data to a designated destination.

Data sample storage

A dedicated space within the Data Integrity Suite where sample datasets are securely stored for testing, validation, and building data quality pipelines. It can be configured to use either Precisely Cloud or AWS S3 bucket storage options.

Data volume

A Data Observability rule, where the number of rows in a dataset are used as a metric to monitor unexpected changes such as additions or deletions of data records.

Dataset entities

Defined structures within a Data Quality pipeline that represent meaningful objects, such as a Location composed of multiple related fields, and include configurable field mappings used primarily in record matching operations.

Datasource

A defined reference to an external data system such as a database, file store, cloud service, or API. It is typically the first configuration step and identifies what data the Data Integrity Suite will work with. Each datasource is linked to a connection, which defines how the suite accesses the data. This structure supports flexible integration and reuse across services.

Datastore

A physical or logical data storage entity that serves as either a source or target in mainframe replication pipelines, characterized by specific data types such as relational databases, files, or messaging systems. It is defined in configuration scripts using the DATASTORE command and must be associated with one or more descriptions that specify its record structure and layout.

Diagnostic bundle

A collection of logs, configuration details, and other relevant data gathered into a single zip file for troubleshooting and performance analysis of replication projects. These bundles are retrieved from all runtime engines used by the project to aid in diagnosing issues or assessing performance.

Distribution key

A key mechanism used in Data Integration service to determine how data is distributed across different nodes or partitions. It ensures efficient data placement and retrieval in distributed database environments.

Drift

A significant change or deviation in data characteristics, structure, or patterns that triggers alerts in data monitoring systems. It encompasses changes in data values beyond specified ranges (data drift) or modifications to database schema elements like tables and columns (schema drift).

End index

A parameter that specifies the last position in a string for Get Substring and Replace Between actions, where the count starts with 1. If the end index exceeds the string length, operations extend to the end of the string.

Enrich

A transformation step in Data Quality pipeline step that enhances existing records by adding additional information from external sources or reference datasets, such as location-based risk data, weather insights, or other attributes. This process makes data more valuable and useful for analysis and decision-making.

Enrichment pipeline

A Data Quality pipeline that enhances existing datasets by adding additional attributes or information from external sources or reference datasets. It typically includes steps like address verification, geocoding, and enrichment to append location-based risk data, demographics, or other relevant attributes to improve data quality and usability.

Exact match

A matching algorithm used in Match and Group transformation step that determines whether two text strings are identical, including case sensitivity, returning a score of 100 for perfect matches and 0 for non-matches. It compares fields character by character to identify records with precisely matching values.

Field mapping

A process that allows users to modify how fields from source tables are mapped to target destinations by selecting mapping options and setting default values. This operation enables users to specify which fields to include, how to handle metadata, and configure mapping defaults for tables that will be created during the integration process.

Freshness frequency

The expected update frequency for a data table, used in threshold-based freshness alerts to trigger notifications when the table is not updated within the defined time interval.

Geo Addressing

A comprehensive solution for efficient address data management, featuring global address verification and geocoding to improve data accuracy and streamline operations, ensuring address data is accurate, complete, and ready for use in various applications, thereby enhancing operational efficiency and decision-making processes.

Geocode Address

A step in Data Quality pipeline that converts address information into geographic coordinates (latitude and longitude) using geocoding services. It requires a data subscription and returns coordinates in decimal degrees to 7 decimal places for precise location mapping.

Group condition

A conditional evaluation applied to a group of transformation steps in a data quality pipeline that determines when the transformations within that group should be executed. When both group conditions and individual step conditions exist, they are combined using the AND operator to create the overall condition for execution.

Mainframe replication

A computing process designed to facilitate seamless data transfer and synchronization between mainframe systems and modern data environments. It enables real-time data replication, ensuring that updates on the mainframe are immediately reflected in target systems, thereby minimizing latency and maintaining data integrity.

Mainframe replication pipeline

A data integration process that facilitates seamless data transfer and synchronization between mainframe systems and modern data environments, enabling real-time data replication with immediate updates reflected in target systems. It maintains data integrity while minimizing latency and can be created, managed, started, and stopped through configuration interfaces.

Metabase

A repository of database tables and objects that defines, enables, and manages data distribution replication projects, containing replication backlogs and metadata about source tables. Each metabase is tied to a unique project and stores information such as log reader capture status, sync request copy status, and backlog tables with changed data updates.

Metadata

Essential information about a dataset that describes its structure and characteristics, such as dataset names, field names and types. The metadata enables organizations to understand their data assets an establish standard data architecture across different modules without disclosing or storing the actual data.

Noise

Unwanted or irrelevant alerts generated by data monitoring systems when confidence levels or thresholds are set too low. It refers to false positive alerts that can overwhelm users and reduce the effectiveness of meaningful data quality notifications.

Observation

A data monitoring process that tracks data assets to identify anomalies, changes, or issues such as freshness, volume, data drift, and schema drift. Observations provide alerts and insights to help maintain data quality and support data-driven decision making.

Observer

A monitoring tool configured with rules to track data assets and generate alerts when anomalies or unexpected changes are detected. It includes scheduling, dataset selection, rule definitions, and notification settings to ensure data accuracy and reliability.

Observer rule

A configuration component of an observer that defines specific criteria for when alerts should be generated by detecting significant changes or anomalies in data assets. There are four types: freshness, volume, data drift, and schema drift rules, each monitoring different aspects of data quality and integrity.

Parsing

A data processing technique that breaks down structured text fields into their component parts for analysis or further processing. It involves extracting meaningful data elements from fields like names, emails, or phone numbers by separating them into constituent components.

Pipeline editor

An editor that allows users to create, modify, and manage data transformation pipelines by adding, editing, and configuring transformation steps. It provides a visual environment with multiple panes for viewing sample data, transformation steps, and editing options for pipeline configuration.

Pipeline engine

A processing engine that provides the computational resources and processing capabilities necessary for running Data Quality pipelines. It is created based on a connection and supports various connection types including Databricks, Precisely Agent, and Snowflake, etc.

Popularity

A metric that shows the number of times a table or column is used in database queries within a specified time period, typically the last 24 hours. It includes both manual queries and queries run during Profile or Observer runs, and is available for datasources like Snowflake, Redshift, and BigQuery.

Precisely Cloud

A cloud-based platform that provides data integrity services and requires secure communication with on-premises environments through installed agents. It serves as the hosting environment for Precisely's Data Integrity Suite services and components.

Registration key

A unique key generated in the Precisely Data Integrity Suite and used during Agent installation. It authenticates the Agent with the Suite, allowing secure registration and communication between the on-premises environment and the Precisely Cloud.

Replication designer role

A user role that allows users to create, edit, and delete configuration for replication pipelines and related resources within the Data Integrity Suite. This role is assigned to users who need to design and configure data replication processes.

Replication engine

A replication engine is the core component that performs data replication from a source to a target system. It works with agents that manage communication and data transfer, while the engine handles the actual replication. Used across Data Integrity Suite services like Data Integration, it ensures real-time synchronization, data consistency, and automation; reducing manual effort and improving reliability.

Replication environment

A configured system setup that includes all necessary components such as agents, runtime engines, data connections, projects, and metabases required to perform data replication operations. It serves as the foundational infrastructure that enables the capture, transfer, and synchronization of data between source and target systems.

Replication operator role

A user role that allows users to start and stop quality and replication pipelines within the Data Integrity Suite. This role is assigned to users in the Operators and Replication Operators user groups to provide operational control over replication processes.

Replication options

Configuration settings that control how data replication pipelines handle errors and conflicts during the data transfer process. These options include transaction error mode settings and conflict resolution strategies between source and target systems.

Replication pipeline

A collection of mappings that define the relationship between source and target data, used to transfer data between systems either continuously, on schedule, or for mainframe-specific processing. It can move source data to a target database in bulk or initiate data capture and replication processes for efficient data management across different systems.

Replication user ID

A unique user identifier created for Oracle metabase operations that must be unique to the metabase or project and cannot be reused across projects. This user ID is also used as the metabase name and is required for establishing replication connections.

Runtime server

A server component in Data Integrity suite that hosts and executes agents and engines, identified by a host name and port number. It provides the runtime environment where these components operate and can be managed through APIs for operations such as starting, stopping, and retrieving details.

Sample data

A subset of records extracted from a larger dataset, used for testing, validation, and quality assurance purposes in data processing workflows. Sample data can be uploaded from external files or generated directly from cataloged datasets, and is stored securely in encrypted format for analysis and pipeline creation.

Schema registry

A centralized service that manages and stores data schemas to ensure data consistency and compatibility in Kafka messaging systems. It provides schema validation and evolution capabilities for data serialization and deserialization processes.

Security policy

A rule-based configuration that specifies who may access which data assets within a workspace, under what conditions, and what actions are permitted or denied. It expresses access control by linking subjects (user groups, individual users, and roles) to resources and operations, and may include constraints such as time, location, or device context. Security policies can be deployed as default (predefined templates automatically created to cover common scenarios) or custom (created by users from scratch to meet specific requirements).

Source dataset

A collection of data within a datasource that serves as the input to a pipeline, whether in Data Quality or Data Integration, and must conform to the pipeline’s input schema for successful execution.

Source server

A server that hosts the original data or resources to be accessed, replicated, or processed by another system or application. In a pipeline, it acts as the origin from which data is extracted.

Spatial Analytics

A service that extracts and visualizes insights from geographical data by analyzing spatial relationships and location-based patterns. It enables businesses to make informed decisions through advanced location intelligence, including risk assessment, proximity analysis, and strategic planning.

Start index

A parameter that specifies the starting position in a string where the replacement should begin. The position count starts from 1. It marks the beginning of the substring that will be replaced with new characters. This parameter is applicable only in Replace Between action in the Cleanse Data transformation step.

Substitution variable

A configurable variable used in mainframe replication pipelines that follows the format %(<parm_name>) and can store string or encrypted string values for use within scripts. These variables enable dynamic configuration and parameterization of pipeline execution by allowing values to be defined once and referenced throughout the pipeline scripts.

Tablespace

A logical storage unit in Oracle database that groups related data files together and provides a way to organize and manage database storage. It serves as a container for database objects like tables and indexes, with each tablespace having associated data files that physically store the data on the database server.

Target dataset

A dataset that serves as the destination for processed data output from a Data Quality pipeline run configuration. It must have a schema that matches the pipeline output schema for the job to execute successfully.

Target field name

The new name assigned to a column or field during a data replication process, such as when renaming fields in a pipeline or mapping fields between datasets.

Target server

The destination server in a replication pipeline where data from the source is transferred and stored. It receives replicated data and maintains connections during the replication process.

Temporary status

A status condition in a replication pipeline that indicates temporary communication issues between components, typically displayed as 'Unknown' status. This status suggests a transient problem that may resolve automatically without requiring immediate intervention.

Transformation step

A specific operation in a data quality pipeline that processes data by performing functions such as filtering, joining, parsing, or standardizing to transform input data into the desired output format. Each transformation step is organized into categories and can be configured, reordered, or deleted within the pipeline workflow.

Workspace

A centralized environment in Data Integrity Suite designed for users to manage and monitor data integrity processes efficiently, providing access to tools for data validation, cleansing, and profiling. Each workspace is tied to one subscription and includes monitoring features, auditing capabilities, and role-based access controls for collaboration among team members.

Workspace owner

A user role in the Data Integrity Suite that has administrative privileges to manage workspace settings including security policies, users, datasources, governance, and catalog management. Workspace owners are automatically assigned the Workspace Manager role and have comprehensive control over workspace operations.