A datasource represents the type of data you wish to connect to, such as BigQuery, Oracle, Tableau Cloud, Snowflake, Databricks, etc.This allows you to set up and manage the connection and access details needed to interact with and utilize the data stored in that particular datasource type. Each datasource type provides different capabilities and access methods.
- Configuring connections: Within a single datasource, you can set up multiple connections, each with its own configuration settings and credentials. This flexibility allows you to tailor access according to various needs. For example, different connections can be set up for different user roles, environments (e.g., development vs. production), or integration purposes, ensuring secure and efficient access to the data.
- Defining data assets: Once a connection is established, you need to specify the data assets you want to work with. Data assets include core components such as databases, tables, views, or datasets within the datasource. During configuration, you'll provide essential connection details such as the Host URL, Port, Database Name, etc to accurately access and interact with these data assets.
- Analyzing insights: Insights are the valuable information and analysis derived from the data assets. After configuring your datasource and connections, set up visualizations like charts, graphs, or dashboards to represent these insights effectively. This help transform raw data into meaningful information, aiding in decision-making and strategic planning.
- Scheduling updates: To keep your data accurate and up to date, configure separate schedules for discovery and insights. Discovery catalogs assets and updates metadata, while insights profiles data and applies data quality rules to selected assets.
By following these steps, you can efficiently connect to a datasource, configure the necessary settings, and leverage the data to generate actionable insights and maintain data accuracy.
Add a new datasource
- Go to the main navigation menu and select Configuration > Datasources.
- Click the +Add Datasource button and choose Datasource Type.
- After selecting the datasource type, set the Datasource name and Description.
- Click Next to proceed to add a connection to your datasource.
"Agents provide communication between your environment and the Precisely Cloud. If your data can only be accessed from within your environment, you need to install and set up an agent. During datasource configuration, you will select the appropriate agent."
Add a connection to the datasource:
-
Configuring Connections:
- On the Add Datasource wizard, provide a Name and Description for the connection.
- Specify properties for the connection, depending on your Datasource type.Note: A connection type can have its data in either Cloud or On-Premises.
- For Cloud connector type, by default, the selection for the Agent field is none.
- For On-Premises connector type, if there are multiple registered agents, you can choose the appropriate agent for this connection. If there is only 1 agent available in the list, it will be selected by default.
- Click Test to verify the connection.
- Upon successful connection, a confirmation message will appear. Select Next to proceed.
-
Adding Data Assets: When adding Data Assets to a datasource, a list of available
schemas for the selected datasource will be displayed.
- Cataloging options:
You can choose to catalog all datasets or select specific datasets.
- Catalog All Datasets:
In the Data Assets step, check the box next to All datasets to catalog.
This selects all available data assets to associate with the datasource.
- Selective cataloging datasets:
In the Data Assets step, all available datasets are displayed by default.
Click All datasets to open the Datasets to Catalog window.
From this window, select the specific datasets you want to catalog.
Click Save to apply your selections.
Note: The selective cataloging feature is only available for RDBMS, Snowflake and Databricks datasources. - Catalog All Datasets:
- Click Next to proceed.
- Cataloging options:
-
Analyzing Insights: During the datasource configuration, you can analyze data by
enabling profiling and quality rule execution to gain insights into patterns, statistics,
and data quality scores.
- Configure pipeline engine:
- In the 'Pipeline engine for performing data analysis' dropdown, a list of compatible pipeline engines for the selected datasource is displayed.Note: Under Insights, the Pipeline Engine dropdown now displays all pipeline engines associated with the same user and the same host as the current connection.
Once a pipeline engine is selected, the Edit button is enabled.
Click Edit to open the Edit Pipeline Engine window and configure engine settings.
Tip: If no compatible engine is available, click Create Pipeline Engine to add a new engine. - Enable data analysis options:
- Profile Datasets:
Turn on the Profile Datasets toggle to initiate profiling on the selected data assets.
Profiling helps you understand various characteristics of your data, such as data types, distributions, and completeness.
The profiling duration depends on the volume of data in the selected assets.
- Run Quality Rules and Calculate
Scores:
Turn on the Run Quality Rules and Calculate Scores toggle to evaluate data quality.
This action runs predefined data quality rules and generates quality scores at the field and dataset levels
The scores help assess the overall quality and trustworthiness of the data.
- Semantic type
identification:* Limited
Availability Enabling this toggle detects
the semantic type associated with the fields upon successful
discovery. Semantic type identification will run once for each
asset and will not be re-run during subsequent discovery
processes. Note: When you turn on the profile datasets toggle, the semantic type detection toggle will be disabled, since profiling automatically collects semantic type information. If you don't want full profiling and only need semantic type detection, simply disable the profile toggle and enable the semantic type identification toggle instead.
Click Next to proceed.
Limited Availability: This feature is currently available only in select workspaces and might be subject to change before general availability. - Profile Datasets:
- Configure pipeline engine:
-
Scheduling Updates:
Schedule Discovery: To activate schedule discovery, toggle the Schedule Discovery switch to ON. This would initiate the discovery recurringly at the configured schedule.
Schedule Insight: To activate scheduling for the execution of rules and profiling, turn on either the Profile Datasets or Run Data Quality Rules toggle, or both in the insights page. This enables the insight scheduling option. Profiling and execution of rules is carried out at the configured insight schedule.
Note: To access the insights schedule, the Schedule Discovery toggle must be enabled when a new data source is added.
Table 1. Scheduler settings Occurrence Description Daily Specifies settings for the scheduler to run every day. Local time: The time at which scheduling starts (hh:mm).
Note: The scheduler uses a 12-hour time format and the default time is 1:00 am.Example: If the schedule is selected for daily local time<hh:mm>, then scheduler will run daily at <hh:mm>.
Weekly Specifies settings for the scheduler to run every seven days. Days: The day(s) in the week that scheduler will run.
Local time: The time at which scheduling starts (hh:mm).Note: The scheduler uses a 12-hour time format and the default time is 1:00 am.Example: If the schedule is selected On<selected days> and local time<hh:mm>, then scheduler will run on the <selected days at <hh:mm> for every week.
Monthly Specifies settings for the scheduler to run every month. TheSelect dayof every month at<time>: The scheduler will run on the every selected date of the month.
Example: If the schedule is selected as The12of every month at 11:00 AM, then profiling or observer will run on every 12th of the month at 11:00 AM.
The<first or last><day>of every month at <time>: The scheduler will run on the selected date of the month.
Example: If the schedule is selected for The firstWednesday of every month at 11:00 AM, then profiling or observer will run on the first Wednesday of every month at 11:00 AM.
Note: The scheduler uses a 12-hour time format and the default time is 1:00 am.Note: Use the Edit Schedule feature to edit the schedule for discovery and insight. If the toggle is disabled for insight schedule while establishing a new datasource, the edit option will be disabled for insight schedule. In this case, only the discovery schedule is shown by default.Tip: On the Schedule page, you can choose to use the same schedule for insight as discovery by selecting the Same as discovery schedule checkbox in the insight schedule page. For existing connections that are already added to the workspace, the discovery schedule is automatically applied to the insight schedule only if either the Profile Datasets or Run Data Quality Rules toggle, or both are turned ON. You can run profiling based on the insights schedule independent of the configured discovery schedule. - To activate scheduled discovery and scheduled insight, turn
ON the Schedule Discovery and Schedule Insight
toggles. You can enable recurring discovery using the Enable
discovery toggle and recurring execution of rules and profiling
using Enable insights toggle. When the Start discovery
now and Start Insight now toggles are turned ON,
the system immediately begins discovery, cataloging, profiling and
execution of rules of all selected data assets upon adding the
datasource. Note: It is observed that when default rules are executed in the Onboarding flow for Databricks connections that have large datasets, the rule execution fails for the selected datasets.
- Select Finish to complete the process.
Once created, you can add additional connection to the datasource. The datasource will now include all associated connections. You can easily access any of these connections directly from the datasource without having to search for the individual connections.
Add connection to an existing datasource
Additional connections can be added to existing datasources. You must have an existing datasource configured with at least one connection.
To add a new connection to an existing datasource:
- Go to the main navigation menu and select Configuration > Datasources.
- Search for the datasource you want to add a connection to and click on its name.
- Alternatively, select the ellipsis next to the datasource name, and select Details to open its page.
- Click +Add to add a new connection.
- Follow the same steps you used initially to configure the new connection details.
The datasource will now include all associated connections. You can easily access any of these connections directly from the datasource without having to search for the individual connections.