In a Data Quality pipeline, use transformation steps to clean and standardize your data. Add multiple steps as per your use case and build a pipeline that solves a problem.
Transformation steps overview
When you create a pipeline, the input data is always the first step in Data Quality pipeline. As you add transformation steps, the output of preceding step serves as an input data for the next step. The steps are categorized according to their functions. When you add a step, you must configure its settings and the output parameters. You can choose to preview your output before you save it. After you save and run the pipeline, the transformation steps run sequentially and the final output is the output of the last step in the pipeline.
Quality pipeline page
The pipeline page allows you to view and edit transformation steps in a pipeline. This page is displayed when you choose to edit a pipeline on the page on the main navigation menu.
-
Pipeline: The pipeline panel displays steps in a pipeline. A new pipeline initially only shows the Original Dataset and Add Step buttons.
- Click the Add Step button to add transformation step to the pipeline. This button always adds a step at the end of the pipeline.
- To add, edit, rename, move, or delete a step in the pipeline, click the step you want to edit, and click the vertical ellipsis in the top-right to access these commands. Should you choose to add a step, you can choose to add it before or after the selected step.
- To quickly add a step between two steps already in a pipeline, point at the connection between the two steps, and click the add step button.
- To view sample data after a step completes its operation, click the step in the pipeline. You can always click Original Dataset to view the original sample data that was read into the pipeline.
- A step that requires a data subscription (such as addressing data), displays the Needs Data Description icon icon. For more information, hover over this icon to open a pop-up window.
-
Search: You can use the Search field name box on this page to identify and select columns in the table or to filter rows by data value. In the Search box, choose one of the following options:
- Column—In the adjacent Search box, enter a column name to select a column. Matching columns are displayed as you type the name. Choose a column in the drop-down list to select it. If you are editing a transform, the selection is added to the Columns box.
- Data—In the adjacent Search box, enter data that you want to view in the inspection table. As you type, rows that do not match the search string in any column are filtered from the table.
- Sample data:
Columns in the table show sample data from dataset fields. Each column heading shows the field name for a column, the base type (such as "string" or "integer"), and the semantic type (such as "First Name" or "Postal Code"). Click a heading to select a column. Press the Ctrl key and click another heading to extend the selection. When you add a transformation step to a the pipeline, columns selected in this table initially populate the Columns box in the transformation settings.
The profiling row under the heading displays a vertical bar chart that shows frequency for representative data values and a horizontal bar that displays relative numbers of valid values in green, invalid/outlier values in red, and null or blank values in gray. Hover the pointer over any bar in the vertical chart to view the value and frequency represented by the bar. Hover the pointer over any colored section of the horizontal bar to view the frequency for any of the three categories. You can click anywhere in the profiling section to expand the Profiling panel, which displays profiling summary and detail information for a sample data column. For more information, see Profiling panel.
The data in the table show the outcome of steps in the pipeline. Click a step in the pipeline to view the sample data after the selected step completes its operation. Click Original Dataset in the pipeline to show the data before it is processed by the pipeline.
- Step Preview:
The Step Preview toggle enables you to preview operations performed by steps in a pipeline. Select a step in the pipeline, then turn on this switch on to view data before and after it is transformed by the step. As long as the switch is on, you can click other steps in the pipeline to show data before and after operations performed by those steps. While you edit a step, you can also click the Preview button to turn on Step Preview and observe how the step affects the data.
The Add Step dialog reference
Choose a pipeline step in this dialog box to configure and add it to a pipeline.
The Add Step dialog box opens when you choose to add a step to a pipeline. The dialog box lists available pipeline steps by category. Click a step to configure and add it to the current pipeline. When you save the settings, the step is added to the pipeline.
Category headings are displayed on the left panel. You can click a category heading to scroll to that category of steps. To search for a particular step, enter any part of the operator name in the search box. As you type, the dialog box will display only those steps that match the search string.