Go to tab on the main navigation menu to view existing pipelines in table format. Columns on the table show for each pipeline, the name of the pipeline along with numbers of records and columns in the sample data used to construct the pipeline, the number of steps in each pipeline, the last user to modify the pipeline, and the date and time for the last modification.
When you Edit a pipeline, the pipeline editor initially displays two panels. The upper panel shows steps in the pipeline. You can add, remove, or edit steps in this panel. The lower panel displays sample data in table format. Each column represents a data field while each row represents a sample record. The top row shows name, semantic type, and field type for each data field.
When you select a step in the upper panel, it highlight columns in the data affected by a step. Highlighted columns show data as it appears after a step is complete. While a step is selected, you can select the Transformation Preview checkbox to display sample values in the highlighted columns as they appear before and after the transformation.
You can add a new step anywhere in a pipeline—at the beginning or end, before or after and existing step, or between steps. When you add or edit a step, the pipeline editor expands a settings panel from the right side of the window.
After you configure transformation options, use the Preview button before you save the settings to view the result of the transformation.
When you are satisfied with the results of a transformation, save and include it in the pipeline. Semantic type detection is applied to new columns as they are added by a step. The same semantic type as the original column is normally applied to a copied column. If the analysis detects the semantic type for a new column, it is displayed beneath the column name.
Create quality pipeline
Use this procedure to create a pipeline from the Pipelines list page.
To create a Data Quality pipeline.
You can edit the pipeline name by clicking the default name. Add and configure transformation steps as needed.
Add input
After the pipeline is established, you can enhance it by incorporating more fields. To do this, select the Add Input option.
- You can only choose the dataset that belongs to the same Agent as the primary input.
- If an incompatible input is added to the pipeline, you may not be able to configure it in the Run configuration, and an error will be thrown stating that incompatible inputs have been added to the pipeline. This will also be reflected in the data validation errors.
Duplicate quality pipeline
You can duplicate an existing pipeline to use it for a new pipeline. A duplicate pipeline initially consists of the same dataset and transformation steps as the original pipeline. Configurations are not copied.
- On the main navigation menu, click .
- In the Name column, find the pipeline you want to duplicate. In the Search keyword box, you can enter a keyword to filter pipelines shown in the table.
- In the Name column, click the ellipsis.
- Click the Duplicate Pipeline command. This opens the Duplicate Pipeline dialog box.
- In the New pipeline name box, enter a name for the duplicated pipeline.
- Click the Duplicate button.
Edit quality pipeline
Edit a pipeline to view, add, or edit transforms that it applies to uploaded data.
- On the main navigation menu, click .
- In the Name column, find the pipeline that you want to edit. In the Search box, you can match all or part of a pipeline title to filter pipelines shown in the table.
- In the Name column, click the ellipsis, then click Edit.
Rename quality pipeline
You can rename a pipeline on the Pipelines page.
- On the main navigation menu, click .
- In the Name column, find the pipeline that contains the pipeline you want to edit.
- In the Search keyword box, you can enter a keyword to filter pipelines shown in the table.
- In the Name column, click the ellipsis, then click Rename.
- In the edit box that appears in the Name column, type a new name, then press Enter. A name must start with a letter. It can contain letters, numbers, dashes, and underscores.
Delete quality pipeline
You can delete a pipeline from the Pipelines page.
- On the main navigation menu, click .
- Find the pipeline that you want to delete in the pipelines table.
- In the Search keyword box, you can enter a keyword to filter pipelines shown in the table.
- In the Name column, click the ellipsis , then click Delete.
- Click Yes to confirm the deletion.
Filter rows that match data in pipeline
While editing a pipeline, you can filter out rows in the inspection table that do not contain specified data.
Quality pipeline settings
- Entity type: Shows the type of entity.
- Entity name: Specifies the name for the entity that shows above columns on the pipeline page. You can edit the name shown in this field.
- Map fields for this entity: You can change fields mapped to the entity field types.
- Name: Specifies the name for the run configuration.
- Connection: Specifies the connection to the source data.
- Source dataset: This specifies the dataset on which to run the pipeline. By default, the dataset used to build the pipeline shows here. Click the Browse button to choose a dataset in a connected database. Input fields for the pipeline must match input fields in the source dataset schema.
- Target options:
- Append to target dataset: Output is appended to columns in the target dataset. You can use this option when the schema of the pipeline output matches the target dataset schema. The pipeline engine verifies that the pipeline output schema and the target dataset schema match before it commences output to the target dataset.
- Overwrite to target dataset: Output overwrites the target
dataset. Choose one of two options from the drop-down list.
- Truncate Dataset: Empties the dataset, then writes data to the same dataset. This choice deletes all data but preserves the dataset definition. You can use this option when the schema of the pipeline output matches the target dataset schema. The pipeline engine verifies that the pipeline output schema and the target dataset schema match before it commences output to the target dataset.
- Drop Dataset: Drop the target dataset, then create and write data to a new dataset. This choice recreates the dataset definition on the host server. The pipeline output does not have to match the target dataset schema and no verification is performed by the pipeline engine.
- Target dataset: This specifies the dataset in which to store the output results from the pipeline. Click the Browse button to choose a dataset in a connected database.
- Pipeline engine: Specifies the pipeline engine to run pipeline jobs. Select one of the pipeline engines from the list or click that Add button to create a new pipeline engine.