The Identify Country step uses available address field values to identify the country.
Step Properties
Step name: Defines the name for a step. Provide a meaningful name so that anyone who edits steps in a pipeline will be able to identify the purpose of a step.
Country Output Format: Specifies the format in which you want to display the names of countries. The available options are:
-
ISO2: Represents countries by their two-letter ISO
country codes. For example:
United States: US Canada: CA United Kingdom: GB -
ISO3: Represents countries by their three-letter ISO
country codes. For example:
United States: USA Canada: CAN United Kingdom: GBR -
Country Name: Displays the full name of the country in a
human-readable form. For example:
US: United States CA: Canada GB: United Kingdom
Enhance with AI: In certain instances, Geocode SDK may fail to identify the country for specific addresses due to the unavailability of complete address information or sufficient details. In such cases, the output fields for addresses of unidentified countries remain blank. To resolve this, you can select the Enhance with AI checkbox, enabling AI to identify the country based on available address information. When you choose this option, an additional output column (Address_set_1_Source, Address_set_2_Source, and so forth) is displayed, indicating the source of identification (AI or Geocode SDK, as applicable).
Address_set_1, Address_set_2, and so forth: One or more address sets may map schema fields to input dataset fields. The address set schema is mapped to each set of address fields in an input dataset when there is more than one set (such as home address, business address, corporate headquarters, and so forth). The Schema column specifies schema names for standard address fields. The Map input field column maps input dataset fields to the Schema names.
To map a dataset field to a schema name, click the column heading in the Map input field column entry that corresponds to the name in the Schema column. When there is no input field corresponding to a schema name, leave Map input field value set to none for that Schema name.
An address set schema contains the following standard address field types:
-
FirmName: Specifies an organization name, place, or
building. Examples:
PRIME MINISTER & FIRST LORD OF THE TREASURYARAMARK Ltd.UNITED NATIONS HEADQUARTERSCHRYSLER BUILDING
-
AddressLine1: The street portion of the address,
including the directional and street suffix, formatted to the country's
standard. If no other address field is populated, then the
AddressLine1 entry is treated as a single line input.
Single line input can consist of multiple input address fields, such as
10 Downing St, London SW1A 2AB, GBR. This field is
required. Examples:
10 Downing Street630 W 168th StPO Box 3554
-
AddressLine2: The unit, suite or apartment portion of an
address that specifically locates an addressee at a street address. This is
typically optional. Examples:
Suite 214Apt 534Maildrop A25
-
City: Specifies a city or town name. Examples:
LondonNew York
-
CitySubdivision: Specifies a city subdivision or
locality. Typically a division such as neighborhood, hamlet, or borough.
Examples:
BromleyBrooklyn
-
StateProvince: The first-level administrative division in
a country. Typically a division such as state, province, department, region, or
territory. Examples:
New YorkWest Midlands
-
StateProvinceSubdivision: The second-level administrative
division in a country. Typically a division such as county, municipality,
parish, prefecture, or district. Examples:
Nassau CountyCoventry
-
PostalCode: The postal code used by a country. Examples:
10032-3725SW1A 2AA
-
Country: The country name or the ISO alpha-3 code. Examples:
United States of AmericaorUSAUnited KingdomorGBRFranceorFRA
Add address dataset: Click this button to add an additional set of address mappings. Address set mappings map schema names to input fields. There is initially one set of mappings, by default titled Address_set_1. If data includes more than one address such as shipping and billing addresses, or a home and business addresses, you can click this button to add an additional address set and map its schema names to the input fields for the additional address. The default name for each added address set increments the index (Address_set_2, Address_set_3, and so forth). You can rename any address set by editing the name that shows above the address set. Alphanumeric and underscore characters are allowed in an address set name. When there is more than one address set, you can click the delete button that appears next to a name to delete a dataset.
Output fields
This pipeline step populates a single IdentifiedCountry output field for each address set.
Address_set_1_IdentifiedCountry, Address_set_2_IdentifiedCountry, and so forth: The output field is populated with the ISO 3166-1 alpha-2 country code that best matches address information in a record. An incorrect country code may be associated with incomplete address information. For example, an input field that includes a value for AddressLine1, but omits values for all other fields may return an incorrect country code. You may therefore want to check for incomplete data.
Create User Defined Functions (UDFs) in Snowflake
In Data Quality pipeline, using the Identify Country step requires you to create external UDFs in Snowflake. This is a one-time activity. There are two separate UDFs required for Identify Country "without AI" or "with AI" to run your pipeline successfully.
Define the Identify Country external functions in Snowflake
The Identify Country operator in Data Quality and Enrichment uses Format 1
type user defined external function for Geocode. This task is only required
to use the Identify Country pipeline operator. The name of the external function for
the Identify Country operator must be precisely_geocode.
Define the AI Identify Country external functions in Snowflake
The Identify Country operator using Enhance with AI in Data
Quality and Enrichment uses Format 1 type user defined external function for
AI Geocode. This task is only required to use the Identify Country pipeline operator
with Enhance with AI. The name of the external function for
the Identify Country operator using AI must be ai-geocode.
Follow the steps provided in Create API Integration by using the script given below:
/*API Integration is required for External User Defined function.*/
create or replace API Integration DQ_AI_API_INTEGRATION
api_provider=aws_api_gateway
enabled = true
api_aws_role_arn =
'arn:aws:iam::508747789874:role/prd-sf-geocode-assume-role'
API_ALLOWED_PREFIXES = ('https://9a3cydoxrc.execute-api.us-east-1.amazonaws.com/v1');
Follow the steps provided in Create External User Defined Function for Geocode by using the script given below. This script is used for input addresses and expects address, country, already identified country (if GeoCode SDK already detected the country), and countryFormat (ISO2, ISO3, CountryName) as input parameters:
create or replace external function precisely_ai_countryid(mainaddressline string
,country string, identifiedcountry string, countryformat string)
returns variant
IMMUTABLE
api_integration = DQ_AI_API_INTEGRATION
HEADERS = ('application'='dq-ai-geoaddressing','headers-api-secret'=<<DIS API SECRET>>,'headers-api-key'=<<DIS API KEY>>)
MAX_BATCH_ROWS = 25
as 'https://9a3cydoxrc.execute-api.us-east-1.amazonaws.com/v1/aigeocode'
grant usage on function