Documentation Index

Fetch the complete documentation index at: https://docs.validatar.com/llms.txt

Use this file to discover all available pages before exploring further.

Data Processing Engine Schema Overview

Prev Next

Overview

When you configure repository replication, Validatar continuously syncs its internal data to your data processing engine (Snowflake or Databricks). This gives your team direct SQL access to metadata catalogs, test definitions, execution results, profiling metrics, trust scores, and security permissions — enabling custom reporting, BI tool integration, and any analysis that benefits from having Validatar's operational data in a queryable warehouse.

This reference section documents every view and table available in the replicated schema, including column definitions, relationships between objects, and practical SQL examples.

Schema Layout

The replicated database contains two schemas:

REPOSITORY Schema

Contains pre-built views that present Validatar's data in a query-friendly format. These are the primary interface for most use cases.

View Purpose
VW_DATASET_TABLES All metadata tables/views with trust scores, custom fields, and profile summaries
VW_DATASET_COLUMNS All metadata columns with data types, custom fields, and detailed profile metrics
VW_DATASET_TESTS Test definitions (standard, template, materialized) with quality dimensions and overall status
VW_DATASET_TEST_EXECUTIONS Test execution history with results, durations, and row-level counts
VW_DATASET_JOBS Job definitions and execution history with pass/fail summaries
VW_DATASET_METADATA_OBJECTS Flat list of all schemas, tables, and columns with paths and URLs
VW_DATASET_EVENT_HISTORY Audit log of all platform events
VW_SECURE_DATASET_* Row-level security filtered versions of the above views

RESULTS Schema

Contains raw test result data and profiling metrics, stored in tables rather than views:

Table Purpose
TEST_EXECUTION_RESULTS Row-level results for single-value tests (one row per comparison)
TEST_EXECUTION_TABLE_COLUMNS Column definitions for multi-row test results
TEST_EXECUTION_TABLE_RESULTS Row-level pass/fail for multi-row tests
TEST_EXECUTION_TABLE_RESULT_KEYS Key column values for each result row (long format)
TEST_EXECUTION_TABLE_RESULT_VALUES Value column comparisons for each result row (long format)
META_DATA_LATEST_PROFILES Most recent profile values per column/table
META_DATA_PROFILE_SET_EXECUTION_RESULTS Historical profile execution results

How the Schemas Relate

┌─────────────────────────────────────────────────────────────────────┐
│ REPOSITORY Schema (Views)                                           │
│                                                                     │
│  VW_DATASET_TABLES ──┐                                              │
│        │             │                                              │
│        ▼             ▼                                              │
│  VW_DATASET_COLUMNS  VW_DATASET_TESTS                               │
│                           │                                         │
│                           ▼                                         │
│                    VW_DATASET_TEST_EXECUTIONS ──────────────┐        │
│                           │                                │        │
│                           ▼                                ▼        │
│                    VW_DATASET_JOB_EXECUTIONS    RESULTS Schema       │
└─────────────────────────────────────────────────────────────────────┘
                                                       │
                                                       ▼
┌─────────────────────────────────────────────────────────────────────┐
│ RESULTS Schema (Tables)                                             │
│                                                                     │
│  TEST_EXECUTION_RESULTS          (single-value tests)               │
│                                                                     │
│  TEST_EXECUTION_TABLE_RESULTS ───┐ (multi-row tests)                │
│       │                          │                                  │
│       ▼                          ▼                                  │
│  TABLE_RESULT_KEYS         TABLE_RESULT_VALUES                      │
│                                                                     │
│  META_DATA_LATEST_PROFILES       (profiling)                        │
│  META_DATA_PROFILE_SET_EXECUTION_RESULTS                            │
└─────────────────────────────────────────────────────────────────────┘

Key Join Patterns

REPOSITORY to RESULTS

Join test executions to their row-level results using TEST_EXECUTION_ID:

SELECT
    te.TEST_NAME,
    te.RESULT,
    r.NUMERIC_SOURCE_VALUE,
    r.NUMERIC_TARGET_VALUE,
    r.PERCENT_DIFFERENCE
FROM REPOSITORY.VW_DATASET_TEST_EXECUTIONS te
JOIN RESULTS.TEST_EXECUTION_RESULTS r
    ON r.TEST_EXECUTION_ID = te.TEST_EXECUTION_ID
WHERE te.LATEST_TEST_EXECUTION_INDEX = 1;

Tables to Columns

Join using TABLE_ID:

SELECT
    t.TABLE_NAME,
    t.TRUST_SCORE,
    c.COLUMN_NAME,
    c.PROFILE_NULL_PERCENT
FROM REPOSITORY.VW_DATASET_TABLES t
JOIN REPOSITORY.VW_DATASET_COLUMNS c
    ON c.TABLE_ID = t.TABLE_ID
WHERE t.DATA_SOURCE_NAME = 'My Warehouse';

Tests to Metadata Objects

Join using the metadata link object key:

SELECT
    t.TEST_NAME,
    t.OVERALL_RESULT_STATUS,
    mo.METADATA_PATH
FROM REPOSITORY.VW_DATASET_TESTS t
JOIN REPOSITORY.VW_DATASET_METADATA_OBJECTS mo
    ON mo.OBJECT_KEY = t.TEST_PRIMARY_METADATA_LINK_OBJECT_KEY
WHERE t.IS_ACTIVE_VERSION = TRUE;

Connecting to the Schema

Snowflake

Use any Snowflake-compatible client or BI tool. The database name, warehouse, and schema are determined by your Validatar data processing engine configuration.

USE DATABASE <your_validatar_database>;
USE SCHEMA REPOSITORY;

-- Verify you can query
SELECT COUNT(*) FROM VW_DATASET_TABLES;

Databricks

Connect via the Databricks SQL endpoint configured as your data processing engine. The catalog and schema names match your Validatar configuration.

USE CATALOG <your_validatar_catalog>;
USE SCHEMA repository;

SELECT COUNT(*) FROM vw_dataset_tables;

Snowflake vs Databricks Differences

The view and table structures are identical between Snowflake and Databricks deployments. Minor SQL syntax differences to be aware of:

Operation Snowflake Databricks
Date difference TIMESTAMPDIFF(second, start, end) UNIX_TIMESTAMP(end) - UNIX_TIMESTAMP(start)
Boolean cast col::boolean CAST(col AS BOOLEAN)
String concat CONCAT(a, b) or a || b CONCAT(a, b)
Current timestamp CURRENT_TIMESTAMP() CURRENT_TIMESTAMP()
Date arithmetic DATEADD(day, -30, CURRENT_DATE) DATE_ADD(CURRENT_DATE(), -30)

Custom Fields

Many views (VW_DATASET_TABLES, VW_DATASET_COLUMNS) include columns prefixed with CUSTOM_. These correspond to custom metadata fields defined in your Validatar environment and will vary by deployment. The column names are derived from the custom field names configured by your administrator.

Refresh Frequency

Data in the replicated schema is updated according to the sync schedule configured in Settings > Configuration > Repository Replication. See Replication of Your Metadata to a Data Processing Engine for configuration details.

Results data (RESULTS schema) is loaded at test execution time — results appear in the replicated tables as soon as the test completes and the data processing engine processes the staged files.