- 20 Mar 2024
- 2 Minutes to read
- PDF
What is a Data Source?
- Updated on 20 Mar 2024
- 2 Minutes to read
- PDF
Overview
A Data Source is a connection to a database or any other data structure. Validatar allows a few connection types:
SQL Server | Snowflake |
OLE DB | PostgreSQL |
ODBC | Python Script |
What Should I Add as a Data Source?
If you have data you want to know more about, test for the level of data quality, or continuously monitor, you should add a connection to it in Validatar. This includes
- Source Systems
- Legacy Data Warehouses
- Cloud Data Warehouses
- Metadata Repositories
- Flat Files
- Tableau
- and more!
Understanding Data Sources in Testing
Validatar tests consist of two essential components: the Test Data Set and the Control Data Set. These components are compared to assess the level of variance and determine if the test data set aligns with the control data set. By examining the variance, you gain insights into the impact on data quality. However, to perform such comparisons, you need to have two distinct data sets to test. This is where Data Sources come into play. When configuring a test query or profile, you can select the appropriate Data source for each data set, enabling Validatar to execute the test against the desired data.
In this Snowflake Migration example below, the Test Data Set is the target Snowflake Warehouse. The Control Data Set is the Legacy SQL Server Warehouse.
You can also use the same Data Source for both data sets. You'll typically do this to validate data in a single database, like checking row counts after a transformation on a table.
Understanding Data Sources in the Data Catalog
Exploring Data Sources
Within the Validatar platform, the Data Catalog empowers you to delve into each of your data sources. By accessing the Data Catalog, you can gain valuable insights into technical metadata and custom metadata. Additionally, you have the opportunity to explore your data source at the schema, table, view, and column levels, uncovering a wealth of information.
Visualized Profile Information: Easy Digestion of Data Insights
At each level of the Data Catalog, you will find visually represented profile information generated after running Validatar profile sets. This visual presentation allows for easy comprehension and analysis. You can quickly grasp essential details, such as the number of columns and records in each table, as well as the Validatar tests associated with each metadata object. These insights provide valuable context and facilitate a comprehensive understanding of what's in a data source.
Detailed Profiling and Profile History at the Column Level
Once you delve into the column-level view, you gain access to in-depth data profiling information and the ability to review the profile history. This level of analysis allows you to examine data profiles with better detail. By monitoring changes in data profiles over time, you can detect potential issues and anomalies within your data. Unusual spikes or declines in profile metrics may indicate data irregularities or problems within the data processing pipeline.
Data Source Settings
Setting up a Data Source is more than creating a connection to a database. There are various configurations that enable you to get the value from Data Sources that were described in the previous sections.
Settings | Description |
---|---|
General Settings | Where you name and define the connection to the data you want to test and monitor. |
Schema Metadata | Where you can pull and view the metadata from your data source connection. You can refresh metadata manually or schedule automatic refreshes. |
Custom Metadata | Custom Metadata allows you to provide additional information for objects in your data source. Import custom metadata via a File or SQL Query. |
Profiling | Enables you to configure data profiles for your data source, review the execution results, and create custom profiles. |
Projects | Manage and assign the Data Source to relevant Projects. |
Permissions | View and manage the permissions for the Data Source. |