Group and aggregate - Precisely Data Integrity Suite

Data Integrity Suite

Product
Spatial_Analytics
Data_Integration
Data_Enrichment
Data_Governance
Precisely_Data_Integrity_Suite
geo_addressing_1
Data_Observability
Data_Quality
dis_core_foundation
Services
Spatial Analytics
Data Integration
Data Enrichment
Data Governance
Geo Addressing
Data Observability
Data Quality
Core Foundation
ft:title
Data Integrity Suite
ft:locale
en-US
PublicationType
pt_product_guide
copyrightfirst
2000
copyrightlast
2026

The Group and Aggregate step combines data from multiple rows into summary values. You can group data by one or more fields and apply aggregation functions such as Sum, Count, Average, Min, or Max to calculate summary statistics.

Limited Availability: This feature is currently available only in select workspaces and might be subject to change before general availability.

For example, consider a retail company that tracks sales data across multiple regions and products. The raw data contains individual transaction records with fields such as Region, Product, and Sales amount. By using the Aggregate step, the company can group sales data by Region and Product, then calculate the total sales for each combination. This creates a summary report showing regional sales by product, making it easier to analyze performance and identify trends.

Sample input data

Region Product Sales
East A 200
East A 200
East B 380
East C 40
West A 270

When you group by Region and Product, and apply the Sum function to the Sales field, the result is a summary table showing total sales for each region and product combination.

Sample output data

Region Product RegionalSalesByProduct
East A 400
East B 380
East C 40
West A 270

To add an Aggregate step to your pipeline:

  1. While building the pipeline, select Add Step on the canvas where you want to add the aggregation.

  2. In the Add Step dialog, select Group and Aggregate.

  3. Configure the step properties as described in the following section.

  4. Click Preview to review the aggregated result.

  5. After you have reviewed the preview result, click Save to add the step to the pipeline.

Step properties

The Aggregate step includes the following configuration options:

Group by (Optional)

  • Specifies one or more fields to group the data by.

  • Select fields from the dropdown list.

  • You can add multiple grouping fields by clicking the field selector.

  • If you do not specify any grouping fields, the aggregation functions are applied to all rows in the dataset.

Aggregate Data

Defines the aggregation operations to perform on your data. For each aggregation, you must specify:

  • Method: The aggregation function to apply. Available methods include:

    • Minimum: Returns the minimum value in the field.

    • Maximum: Returns the maximum value in the field.

    • Average: Calculates the mean value of the field.

    • Standard Deviation: Returns the statistical standard deviation of all values in the specified expression. Applicable to numeric fields.

    • Variance: Returns the statistical variance of all values in the specified expression. Applicable to numeric fields.

    • Median: Returns the median of all values in the specified expression. Applicable to numeric fields.

    • Count: Counts the number of rows in each group.

    • Count Distinct: Returns the number of distinct items in a group. Counts only unique values in the specified column.

  • Field: The source field to aggregate. Select from the available fields in your dataset.

  • New Field Name: The name of the output field that contains the aggregated result. This field appears in the output dataset.

Add aggregate

Click this button to add another aggregation operation. You can define multiple aggregations in a single Aggregate step to calculate different summary values from your data.

Output configuration

The output of the Aggregate step includes:

  • All fields specified in the Group by section.

  • All new fields created by the aggregation operations, named according to the New Field Name you specified.

The output dataset contains one row for each unique combination of grouping field values, with the aggregated results calculated for each group.

Tips and best practices

  • Use meaningful names for your aggregated fields to make the output data easy to understand and use in downstream steps.

  • If you do not specify any grouping fields, the aggregation functions are applied to the entire dataset, resulting in a single row of summary data.

  • You can add multiple aggregation operations in a single step to calculate different summary statistics from the same or different fields.

  • Use the Preview feature to verify that your aggregation configuration produces the expected results before saving the step.

  • Aggregation steps are useful for creating summary reports, calculating key performance indicators, and preparing data for analysis and visualization.