A Data Catalog helps you organize and manage your data assets in an easy-to understand formats which is easily accessible for all users. A business-ready catalog enables users to search for data, trace its origins, and understand its journey through the supply data chain without the complexity of technical terminology.
A Data Catalog must include:
- Business data lineage for tracking data origin and journey.
- Metadata transparency detailing data attributes, definitions, and synonyms.
- Standardized data definitions.
- Identification of data synonyms for alternative names.
- Key business attributes showing data classifications and descriptors.
- Usage information on data access and application.
Catalog assets overview
To view all your cataloged assets, navigate to Catalog from the main navigation menu. By default, information is displayed in card and list views, sorted alphabetically by asset name. For detailed insights into specific assets in the Data Integrity Suite, you can drill down further within each tab:
- Lists connections showcasing details such as datasource name, number of datasets within and fields present.
- Refer to Datasources for more information.
- Displays datasets with details such as, dataset name, dataset location, associated datasource, number of fields present.
- When you hover over the Location name,
you can view the asset path in the format
Datasource>Schema>Dataset. - Additionally, you can upload or generate sample data for a dataset to create a Data Quality pipeline.
- Refer to Datasets for more information.
- Lists the fields of specific a dataset with details such as field name, associated datasource name, field type.
- When you hover over the Dataset name,
you can view the asset path in the format
Datasource>Schema>Dataset>Field. - Refer to Fields for more information.
- Lists the following details, technical asset types and description, asset count for each type. Click on an asset type name to create or manage technical assets for that type.
- Refer to Technical assets for more information.
- Lists jobs history to provide you with visibility into the status and details of the cataloging process, and troubleshoot any issues that occur.
- Refer to Jobs for more information.
Example: A SalesDB datasource monitors sales transactions and comprises three datasets featuring fields such as SaleID, ProductID, Quantity, and SaleDate. Among these, the SalesRecords dataset, which contains four fields, is essential for organizing pertinent information for validation and reporting. Each dataset includes the SaleID field, which acts as a unique identifier for every sale, crucial for preventing duplication and ensuring accurate data relationships. Additionally, business assets such as the Sales Report offer a comprehensive overview of operations, while technical assets such as the Data Warehouse function as a central repository for integrated data. Users can implement rules such as the Sales Amount Check to ensure that sales amounts are not negative, and manage tasks through jobs such as Job ID 002, which is currently in progress for data validation within the SalesDB datasource. Collectively, these elements work together to uphold data integrity and facilitate informed decision-making.