You can upload sample data to assure that a sample includes specific characteristics, such as duplicates or addressing errors. When these or other concerns do not apply, you can quickly generate sample data from the source data.
Upload sample
To upload sample data that matches fields in a dataset from a file:
- Navigate to .
- Click the cataloged dataset name for which you want to upload sample data.
- Click +Upload Sample.
- Choose the file that contains the sample data, then click Open.
The data is uploaded and stored in encrypted format in the Precisely Cloud, and displayed in a table. The table columns appear in the same order as the fields appear in the file. The storage location (such as Stored in the Precisely Cloud) shows in the margin above the table. The number of days until the sample data will be purged if it is left unused also shows above the table.
- If previously uploaded sample data appears on the tab, you must first click the Delete button to clear the existing data.
- If the upload fails, a dialog box will describe problems with the upload. Common issues that prevent an upload include the following: sample and dataset fields do not match, file size exceeds the maximum allowed file size, or unsupported file type.
- Sample datasets are by default stored in encrypted format on the Precisely Cloud. They are deleted 90 days after they are last used or browsed.
Generate sample
When you choose to generate sample data from a cataloged dataset, you can retrieve from 1 to 2000 records from the cataloged dataset. Currently, records can only be sampled sequentially, starting with the first record in the cataloged dataset. When you generate sample data, you can select the dataset fields that you want to include in the sample data. Names of fields in the generated sample data always match the names of the selected fields. The data is encrypted and stored in the Precisely Cloud.
To generate sample data from a connected datasource:
- Navigate to .
- Click the cataloged dataset name for which you want to create a sample dataset from source data.
- Click Generate Sample. This expands the Generate Sample panel on which you can see Fields in the dataset.
- Enter a value for the Number of rows to
include in the sample.Note:
- The sample will sequentially retrieve the number of records specified here, starting with the first record in the dataset. You can specify between 1 and 2000 rows. The default value is 100. You may choose to increase this value to capture additional variability from the source data.
- At the Sample Preview and Pipeline Grid section, you will see 500 records if the sample was generated with 500 or more sample count.
- In the Connections box, you can choose the connection to the host that you want to use. You may choose to use a connection that provides the authentication, cluster, or other settings that you need to connect to the host.
- Click Generate Sample.
This procedure starts a job that retrieves the specified number of rows of data from the source dataset. Upon successful completion of the job, the sample data appears in table format on the tab. The storage location (such as Stored in the Precisely Cloud) shows in the margin above the table. The number of days until the sample data will be purged if it is left unused also shows above the table. After you complete this procedure, you can click + Create Pipeline to use the sample data to create a pipeline.
Delete sample
- Navigate to .
- Click the cataloged dataset name for which you want to browse a dataset sample.
- This opens the page where you can view the sample for that dataset.
- Click Delete Sample.
- When prompted whether to delete the sample, click Delete.