Overview
The preceding articles in this section each cover one component of a data source template in depth. This article steps back to show how all those components work together — from the moment a template is configured to the point where users are running tests and viewing profiling results against live data sources.
If you haven't read the other articles yet, this one provides a useful map of the territory. If you've already read them, this is where the pieces connect.
The Data Source Lifecycle
A data source template enables a lifecycle that flows from platform configuration through to continuous data quality monitoring:
┌─────────────────────┐
│ Data Source Template │
│ (Platform Blueprint) │
└──────────┬──────────┘
│ defines how to work with this platform
▼
┌─────────────────────┐
│ Connection │
│ (Specific Instance) │
└──────────┬──────────┘
│ connects to a specific database/API/file system
▼
┌─────────────────────┐
│ Data Source │
│ (Validatar Object) │
└──────────┬──────────┘
│ uses template scripts and definitions
▼
┌─────────────────────┐
│ Metadata Ingestion │
│ (Schema Discovery) │
└──────────┬──────────┘
│ populates the data catalog
▼
┌─────────────────────┐
│ Data Profiling │
│ (Statistical Analysis)│
└──────────┬──────────┘
│ measures data characteristics
▼
┌─────────────────────────────────────────┐
│ Test Generation & Execution │
│ (Recommendations, Templates, Macros) │
└──────────┬──────────────────────────────┘
│ validates data quality
▼
┌─────────────────────┐
│ Trust Scores & │
│ Monitoring │
└─────────────────────┘
Each stage depends on the one before it, and the template's components contribute at every step.
How Each Component Contributes
Data Types → Correct Metadata Classification
When metadata ingestion discovers columns, the data type mappings translate platform-specific types into Validatar's internal type system. This classification determines:
- Which profiling metrics apply to each column (numeric profiles for numeric columns, string profiles for string columns)
- What test recommendations are relevant
- How columns appear in the data explorer
If data types are wrong: Columns get classified incorrectly. Profiling skips applicable metrics or runs inappropriate ones. Test recommendations miss the mark.
Metadata Ingestion Scripts → Catalog Population
The ingestion scripts (SQL or Python) discover the structure of each data source and populate Validatar's data catalog. This catalog is the foundation for everything downstream:
- Profile sets need to know which tables and columns exist before they can profile them
- Test recommendations analyze the catalog to suggest relevant tests
- Template tests use metadata to find matching structures and generate child tests
- Macro parameters (Schema, Table, Column dropdowns) are populated from the catalog
If ingestion is wrong: The catalog is incomplete or inaccurate. Profiling misses objects, test recommendations are irrelevant, and macro dropdowns show incorrect options.
Profile Definitions → Data Quality Metrics
The profiling configuration (SQL definitions or Python scripts) determines what data quality metrics are available. Profile sets on each data source select which metrics to run and how often.
Profile results serve multiple purposes:
- Data explorer — users browse current and historical profile values to understand their data
- Trust scores — profiling metrics feed into the quality scoring system
- Test data sets — tests can use profile results as dynamic thresholds or comparison baselines
- Anomaly detection — historical profile trends help identify unexpected changes
If profiling is misconfigured: Key metrics are missing, trust scores are incomplete, and users lack the statistical foundation for data quality decisions.
Macros → Reusable Test Patterns
Macros bridge the gap between template-level platform knowledge and day-to-day test creation. They encode SQL patterns as parameterized snippets with metadata-linked dropdowns, enabling users to create effective tests without writing SQL from scratch.
Macros depend on metadata ingestion to populate their parameter dropdowns. A macro with Schema, Table, and Column parameters is only as useful as the metadata catalog is complete.
If macros are poorly designed: Users fall back to writing custom SQL, losing the consistency and reusability benefits. The barrier to creating tests goes up.
Parameters and Execution Scripts → Session Setup
Default parameters provide Python templates with the configuration they need to connect to external systems. Execution scripts ensure database sessions are in the right state before any query runs.
These are the "plumbing" that makes everything else work. They're consumed by ingestion scripts, profiling, and macro execution.
If parameters are missing or execution scripts are wrong: Ingestion fails to connect, profiling queries error out, and test execution produces unexpected results.
SQL Template vs. Python Template: Side by Side
| Component | SQL Template | Python Template |
|---|---|---|
| General | Same: name, version, category, delimiters | Same (delimiters less relevant) |
| Data Types | Maps SQL types to Validatar types | Maps script return types to Validatar types |
| Parameters | Rarely used (connection string handles config) | Essential — API keys, file paths, connection info |
| Execution Scripts | Session setup SQL (SET statements, USE commands) | Less common — scripts manage their own setup |
| Metadata Ingestion | Three SQL scripts (schema, table, column) | One Python script returning up to 3 DataFrames |
| Profiling | Individual profile definitions with SQL expressions | Profile scripts returning metric DataFrames |
| Macros | SQL snippets with metadata-linked parameters | Same — macros work identically on both |
The key insight: macros and the general configuration are the same regardless of template type. The differences are in how the template connects to data (parameters vs. connection strings), discovers metadata (SQL vs. Python), and calculates profiles (SQL expressions vs. Python scripts).
Common Scenarios
Setting Up a Snowflake Template
Snowflake is a SQL template with some platform-specific considerations:
- General — Category: Database. Delimiters:
"/". Connection type: Snowflake. - Data Types — Map Snowflake types including
VARIANT,OBJECT,ARRAY, andNUMBERvariants. - Execution Scripts — Use pre-execution to set warehouse, role, and session parameters:
USE WAREHOUSE VALIDATAR_WH; USE ROLE VALIDATAR_ROLE; ALTER SESSION SET TIMEZONE = 'UTC'; - Metadata Ingestion — Query
INFORMATION_SCHEMA.SCHEMATA,INFORMATION_SCHEMA.TABLES, andINFORMATION_SCHEMA.COLUMNS. Filter outINFORMATION_SCHEMAschema. - Profiling — Standard profile definitions work with Snowflake SQL. Platform-specific functions like
APPROX_PERCENTILEcan improve performance for large tables. - Macros — Standard SQL macros. Use Snowflake-specific syntax where needed (e.g.,
FLATTENfor semi-structured data).
Creating a Python Template for an API Data Source
For a REST API with Swagger documentation:
- General — Category: Script. Connection type: Python Script.
- Parameters — Define
api_base_url(String),api_key(Secret),page_size(Integer),environment(Dropdown: prod/staging/dev). - Metadata Ingestion — Python script that reads the Swagger spec to discover endpoints (schemas), resources (tables), and fields (columns).
- Profiling — Python script that samples each endpoint and calculates record counts, null counts, and distinct counts.
- Data Types — Map JSON types (
string,integer,number,boolean,array,object) to Validatar types. - Macros — May be limited if the API doesn't support ad-hoc queries. Consider macros that construct API filter parameters.
Customizing an Existing Template
When the built-in template works for most purposes but needs adjustments:
- Export the existing template as a backup
- Modify the specific component that needs adjustment:
- Add new data type mappings for custom types
- Adjust ingestion scripts to include or exclude specific schemas
- Add custom macros for domain-specific testing patterns
- Add platform-specific profile definitions
- Test with a single data source before rolling out changes
- Consider whether the customization should be a template modification or a per-data-source override
Tip: If the customization only applies to one or a few data sources, use per-data-source overrides (on the Schema Metadata or Profile Sets pages) rather than modifying the template. Template changes affect all data sources using that template.
The Template as Living Infrastructure
Data source templates aren't set-and-forget configuration. They evolve with your data platform:
- New database versions may introduce new data types that need mappings
- Schema changes may require ingestion script adjustments
- New testing patterns become macros that benefit all users on the platform
- Performance tuning may improve profiling efficiency for large databases
- Platform features (like Snowflake's semi-structured data or Databricks' Unity Catalog) may warrant new profile definitions or macro patterns
Treat templates as shared infrastructure. Version them, test changes carefully, and export backups before modifying production templates. When you develop a template that works well, consider exporting it for use in other environments or publishing it to the Validatar Marketplace.
Related Articles
- What Is a Data Source Template?
- Creating and Managing Data Source Templates
- Data Types
- Default Parameters and Execution Scripts
- Metadata Ingestion Scripts — SQL Templates
- Metadata Ingestion Scripts — Python Templates
- Macros
- Profiling Configuration — SQL Templates
- Profiling Configuration — Python Templates