Upload sample data for pipeline preview and configuration - Precisely Data Integrity Suite

Data Integrity Suite

Product
Spatial_Analytics
Data_Integration
Data_Enrichment
Data_Governance
Precisely_Data_Integrity_Suite
geo_addressing_1
Data_Observability
Data_Quality
dis_core_foundation
Services
Spatial Analytics
Data Integration
Data Enrichment
Data Governance
Geo Addressing
Data Observability
Data Quality
Core Foundation
ft:title
Data Integrity Suite
ft:locale
en-US
PublicationType
pt_product_guide
copyrightfirst
2000
copyrightlast
2026

Upload or select sample data to define the pipeline input structure, validate data compatibility, and preview how the pipeline processes your data before running it on the full dataset.

Limited Availability: This feature is currently available only in select workspaces and might be subject to change before general availability.

Overview

Sample data is essential when setting up or modifying a pipeline. It defines the pipeline input structure, enables you to validate that your data matches the pipeline's expected fields and format, and allows you to preview how the pipeline processes your data before committing to a full pipeline execution. You can provide sample data from two sources: upload a new CSV file directly or select an existing dataset from your data catalog. The system automatically validates that the sample fields match the pipeline input requirements and provides clear feedback if there are mismatches.

Getting Started

  1. Go to Quality and navigate to the Pipelines tab and click on Create pipeline.
  2. Choose one of the following options:
    • To upload a new sample file: Click Upload Sample and then choose a file or drag and drop a CSV or TXT file into the upload area.
    • To select a sample from a cataloged dataset: Click Browse Dataset to view available datasets in your catalog.

Upload a New Sample File

Upload a CSV or TXT file directly to the system to use as sample data for your pipeline.

  1. In the Upload Sample dialog, click choose a file or drag and drop a CSV or TXT file into the upload area.
  2. Select your file from your local system. The file size appears after selection.
  3. In the Give sample a unique name field, enter a descriptive name for the sample.
  4. Click Upload & Preview Sample.
  5. Wait for the system to validate the uploaded file against the pipeline input requirements.

Results: If validation passes, the sample is configured and ready for preview. The system displays:

  • Sample Name: The unique identifier you provided (editable).
  • Sample Source: Indicates the sample is from a standalone upload.
  • Record Count: The number of records in the sample.
  • Input Fields: A list of all fields with their data types.
  • Stored in: Storage location and retention period (for example, "Precisely Cloud (Data sample will be purged in 89 days)").

If validation fails, an error message appears. Review the error and act accordingly.

Click on Create Pipeline button to create a pipeline.

Select a Sample from a Cataloged Dataset

Browse and select an existing dataset from your data catalog to use as sample data for your pipeline.

  1. In the Select or Add a Data Sample dialog, click Browse Dataset.
  2. The Sample from cataloged dataset view opens, displaying available datasets.
  3. Optionally, use the Filter options to narrow results by datasource type or storage location.
  4. Optionally, use the Search field to find a specific dataset by name.
  5. Review the dataset details displayed in the right panel, including:
    • Dataset name and description
    • Storage location
    • File format and properties (encoding, separators, headers)
    • Partitioning configuration
    • Number of records
  6. Click the radio button next to the dataset you want to use as a sample.
  7. Click Next.
  8. Wait for the system to validate the dataset fields against the pipeline input requirements.

Results: If validation passes, the sample is configured and ready for preview. The system displays the sample details including name, source, record count, input fields, and storage information.

If validation fails, an error message appears. Select a different dataset with matching fields or modify your pipeline input configuration.

Change an Existing Sample

  1. In the pipeline input panel, locate the Sample section.
  2. Click Change Sample.

Preview Sample Data in the Pipeline

After you configure a sample, you can preview the sample data in the pipeline editor to see how it flows through each transformation step.

  1. After sample validation passes, the pipeline editor displays the sample data in table format in the lower panel.
  2. Each column represents a data field, and each row represents a sample record.
  3. The top row shows the field name, semantic type, and field type for each data field.
  4. Select a transformation step in the upper panel to highlight the columns affected by that step.
  5. Optionally, select the Transformation Preview checkbox to display sample values as they appear before and after the transformation.

Understand Field Validation

After you provide a sample, the system automatically validates the sample fields against the pipeline input requirements.

  1. The system checks that all sample fields match the pipeline input fields in name and data type.
  2. If validation passes, the system confirms compatibility and allows you to proceed with preview and configuration.
  3. If validation fails, the system displays an error message indicating which fields do not match.
  4. To resolve field mismatches, you have two options:
    • Upload or select a different sample with matching fields.
    • Modify your pipeline input configuration to match your sample data.
  5. Click the View pipeline input fields link in the error message to review the expected field structure and ensure your sample aligns with it.

Sample Data Considerations and Best Practices

Consider the following when selecting or uploading sample data:

  • File Format: The upload feature currently accepts CSV and TXT files only. If your sample data is in a different format (Excel, JSON, Parquet), convert it to CSV before uploading or select a pre-cataloged dataset in the desired format.

  • Sample Size: Use a representative sample that includes at least 50–100 records to provide meaningful preview data. Include all field types present in your full dataset and edge cases or data quality issues you want to test.

  • Field Matching: Sample fields must match the pipeline input fields exactly. If you receive a field mismatch error, upload or select a different sample with matching fields, or modify the pipeline input configuration.

  • Data Retention: Uploaded samples are stored temporarily in Precisely Cloud and are automatically purged after a retention period (typically 89 days). If you need to retain sample data longer, consider cataloging it as a permanent dataset.

  • Cataloged Dataset Filtering: When browsing cataloged datasets, use the Filter options to narrow results by datasource type or storage location. This helps you find the right dataset quickly, especially in large catalogs.

  • Sample vs. Full Data: The sample is used only for preview and configuration. When you run the pipeline, it processes the actual data source, not the sample. Click the Run button to execute the pipeline on your full dataset.