This step parses personal and business names into their constituent parts, such as given name, surname, and titles and qualifiers.
The Parse Name step completes several actions while parsing a name.
- Identify names as business or personal names.
- Determines the proper boundary between the given name, middle name, and surname.
- Identifies and separates titles and qualifiers from personal names.
- Identifies and separates suffix from firm names.
- Scores how likely the parse is correct based on similar names, and the relative
frequency of the name phrase as a given name or surname.
Configure the step properties and output field on two tabs.
Configure the step properties and output fields on two tabs.
- Step properties Selections on the Step properties tab specify behavior and output of this step.
- Output configuration Selections on the Output configuration tab define the output schema for the Parse Name step.
Step properties
Selections on the Step properties tab specify behavior and output of this step.
Step name
Defines the name for a step. Provide a meaningful name so that anyone who edits steps in a pipeline will be able to identify the purpose of a step.
Columns
Specifies columns to transform. Click to view the list of column names. Column names listed here correspond to the column names in the dataset inspection table. Click in the drop-down list to select or clear check boxes next to column names. Alternatively, click column headings in the inspection table to select or clear check boxes next to the corresponding column names in this box. To add a column to the selection, you must press the Ctrl key when you click a column heading.
Save
Click this button to close settings and save changes to the transformation settings.
Preview
Click this button to preview the results of the transformation settings.
Cancel
Click this button to close settings for this transformations without saving any changes.
Output configuration
Selections on the Output configuration tab define the output schema for the Parse Name step.
This tab lists output fields defined by the Step properties settings. Clearing the check box next to a field name removes the corresponding field from the Step preview.
All of the fields displayed on this tab are initially selected by default. Clear the check box next to any field that you do not want to include in the step output. Select or clear the All fields check box at the top of the list to select or clear check boxes for all of the fields displayed on this tab.
You typically want to review selections on this tab after you edit settings on the Step properties tab. The fields displayed on this tab change as you configure settings on the Step properties tab. As you change settings on the Step properties tab, the number of selected fields and the total number of fields are displayed in parenthesis on the Output configuration tab label, for example, Output configuration (3 of 6).
Output columns
For each of the input fields, the Parse Name step adds columns the data.
The added columns are labeled by an identifying string appended to the original column name ColumnName.
General output columns
General output columns are appended for both firm and personal names. These columns provide information about the parsing operation.
ColumnName_IsParsed
Boolean specifies whether column field was parsed as a firm or person.
-
true—step successfully parsed the input column as a firm or person name. -
false—step was unable to parse the input column as a firm or person name.
ColumnName_ParsingScore
Scores from 0 to 100 that a name was parsed correctly.
ColumnName_ParsedAs
Specifies how the column field was parsed.
-
firm—Field parsed as a firm name. -
personal—Field parsed as a person name.
Personal output columns
Output columns for a column that is identified as a personal name separates parts of the name into different columns. Names are parsed in both natural (first name first) and reverse order (last name first). The original column name is separated into its parts. The output columns are appended by a string that identifies a title, first name, middle name, last name, and suffix.
ColumnName_TitleOfRespect
An honorific title (such as Mr. Mrs, Ms, or Professor) identified for a person in the input column.
ColumnName_FirstName
The first name identified for a person in the input column.
ColumnName_MiddleName
The middle name identified for a person in the input column.
ColumnName_LastName
The last name identified for a person in the input column.
ColumnName_MaturitySuffix
The suffix identified for a person in the input column. This suffix is sometimes categorized as generational (such as Jr. or Sr.), professional (J.P. or PT), or educational (MBA or PH.D.). This column collects any of these that follow a name in a personal name column.
Firm output columns
Firm output columns append identifying strings starting with _Firm. These names are parsed from columns that are identified as firm names.
ColumnName_FirmName
A distinguishable firm name identified in a column field.
ColumnName_FirmSuffix
The corporate suffix (such as "Inc.", "Incorporated", "Ltd.", or "Limited") identified in a column field.