Identify country - Precisely Data Integrity Suite

Data Integrity Suite

Product
Spatial_Analytics
Data_Integration
Data_Enrichment
Data_Governance
Precisely_Data_Integrity_Suite
geo_addressing_1
Data_Observability
Data_Quality
dis_core_foundation
Services
Spatial Analytics
Data Integration
Data Enrichment
Data Governance
Geo Addressing
Data Observability
Data Quality
Core Foundation
ft:title
Data Integrity Suite
ft:locale
en-US
PublicationType
pt_product_guide
copyrightfirst
2000
copyrightlast
2026
Type: Addressing step

The Identify Country step uses available address field values to identify the country.

Step Properties

Step name: Defines the name for a step. Provide a meaningful name so that anyone who edits steps in a pipeline will be able to identify the purpose of a step.

Country Output Format: Specifies the format in which you want to display the names of countries. The available options are:

  • ISO2: Represents countries by their two-letter ISO country codes. For example:
    United States: US
    Canada: CA
    United Kingdom: GB
  • ISO3: Represents countries by their three-letter ISO country codes. For example:
    United States: USA
    Canada: CAN
    United Kingdom: GBR
  • Country Name: Displays the full name of the country in a human-readable form. For example:
    US: United States
    CA: Canada
    GB: United Kingdom

Enhance with AI: In certain instances, Geocode SDK may fail to identify the country for specific addresses due to the unavailability of complete address information or sufficient details. In such cases, the output fields for addresses of unidentified countries remain blank. To resolve this, you can select the Enhance with AI checkbox, enabling AI to identify the country based on available address information. When you choose this option, an additional output column (Address_set_1_Source, Address_set_2_Source, and so forth) is displayed, indicating the source of identification (AI or Geocode SDK, as applicable).

Note: To access the AI functionality in the Data Quality Pipelines, you must initially enable it at the workspace level under the AI tab.
Warning: For running the data quality pipeline on Snowflake using the Identify step, you must create an External User Defined Function (UDF). See Create User Defined Functions (UDFs) in Snowflake.

Address_set_1, Address_set_2, and so forth: One or more address sets may map schema fields to input dataset fields. The address set schema is mapped to each set of address fields in an input dataset when there is more than one set (such as home address, business address, corporate headquarters, and so forth). The Schema column specifies schema names for standard address fields. The Map input field column maps input dataset fields to the Schema names.

To map a dataset field to a schema name, click the column heading in the Map input field column entry that corresponds to the name in the Schema column. When there is no input field corresponding to a schema name, leave Map input field value set to none for that Schema name.

An address set schema contains the following standard address field types:

  • FirmName: Specifies an organization name, place, or building. Examples:
    • PRIME MINISTER & FIRST LORD OF THE TREASURY
    • ARAMARK Ltd.
    • UNITED NATIONS HEADQUARTERS
    • CHRYSLER BUILDING
  • AddressLine1: The street portion of the address, including the directional and street suffix, formatted to the country's standard. If no other address field is populated, then the AddressLine1 entry is treated as a single line input. Single line input can consist of multiple input address fields, such as 10 Downing St, London SW1A 2AB, GBR. This field is required. Examples:
    • 10 Downing Street
    • 630 W 168th St
    • PO Box 3554
  • AddressLine2: The unit, suite or apartment portion of an address that specifically locates an addressee at a street address. This is typically optional. Examples:
    • Suite 214
    • Apt 534
    • Maildrop A25
  • City: Specifies a city or town name. Examples:
    • London
    • New York
  • CitySubdivision: Specifies a city subdivision or locality. Typically a division such as neighborhood, hamlet, or borough. Examples:
    • Bromley
    • Brooklyn
  • StateProvince: The first-level administrative division in a country. Typically a division such as state, province, department, region, or territory. Examples:
    • New York
    • West Midlands
  • StateProvinceSubdivision: The second-level administrative division in a country. Typically a division such as county, municipality, parish, prefecture, or district. Examples:
    • Nassau County
    • Coventry
  • PostalCode: The postal code used by a country. Examples:
    • 10032-3725
    • SW1A 2AA
  • Country: The country name or the ISO alpha-3 code. Examples:
    • United States of America or USA
    • United Kingdom or GBR
    • France or FRA

Add address dataset: Click this button to add an additional set of address mappings. Address set mappings map schema names to input fields. There is initially one set of mappings, by default titled Address_set_1. If data includes more than one address such as shipping and billing addresses, or a home and business addresses, you can click this button to add an additional address set and map its schema names to the input fields for the additional address. The default name for each added address set increments the index (Address_set_2, Address_set_3, and so forth). You can rename any address set by editing the name that shows above the address set. Alphanumeric and underscore characters are allowed in an address set name. When there is more than one address set, you can click the delete button that appears next to a name to delete a dataset.

Output fields

This pipeline step populates a single IdentifiedCountry output field for each address set.

Address_set_1_IdentifiedCountry, Address_set_2_IdentifiedCountry, and so forth: The output field is populated with the ISO 3166-1 alpha-2 country code that best matches address information in a record. An incorrect country code may be associated with incomplete address information. For example, an input field that includes a value for AddressLine1, but omits values for all other fields may return an incorrect country code. You may therefore want to check for incomplete data.

Create User Defined Functions (UDFs) in Snowflake

In Data Quality pipeline, using the Identify Country step requires you to create external UDFs in Snowflake. This is a one-time activity. There are two separate UDFs required for Identify Country "without AI" or "with AI" to run your pipeline successfully.

Define the Identify Country external functions in Snowflake

The Identify Country operator in Data Quality and Enrichment uses Format 1 type user defined external function for Geocode. This task is only required to use the Identify Country pipeline operator. The name of the external function for the Identify Country operator must be precisely_geocode.

Note: When the Identify Country operator is added to a pipeline, address data will be sent to the Precisely Cloud through the Identify Country external function.

Define the AI Identify Country external functions in Snowflake

The Identify Country operator using Enhance with AI in Data Quality and Enrichment uses Format 1 type user defined external function for AI Geocode. This task is only required to use the Identify Country pipeline operator with Enhance with AI. The name of the external function for the Identify Country operator using AI must be ai-geocode.

Note: When the Identify Country operator using AI is added to a pipeline, address data will be sent to the Precisely Cloud through the AI Geocode external function.

Follow the steps provided in Create API Integration by using the script given below:

/*API Integration is required for External User Defined function.*/

create or replace API Integration DQ_AI_API_INTEGRATION
api_provider=aws_api_gateway
enabled = true
api_aws_role_arn =
'arn:aws:iam::508747789874:role/prd-sf-geocode-assume-role'
API_ALLOWED_PREFIXES = ('https://9a3cydoxrc.execute-api.us-east-1.amazonaws.com/v1');

Follow the steps provided in Create External User Defined Function for Geocode by using the script given below. This script is used for input addresses and expects address, country, already identified country (if GeoCode SDK already detected the country), and countryFormat (ISO2, ISO3, CountryName) as input parameters:

create or replace external function precisely_ai_countryid(mainaddressline string
,country string, identifiedcountry string, countryformat string)
returns variant
IMMUTABLE
api_integration = DQ_AI_API_INTEGRATION
HEADERS = ('application'='dq-ai-geoaddressing','headers-api-secret'=<<DIS API SECRET>>,'headers-api-key'=<<DIS API KEY>>)
MAX_BATCH_ROWS = 25
as 'https://9a3cydoxrc.execute-api.us-east-1.amazonaws.com/v1/aigeocode'

grant usage on function