Getting Started as a Data Architect

Prev Next

Overview

As a Data Architect or Data Engineer, you've invested significant effort designing your data warehouse, building ETL/ELT pipelines, and ensuring data flows correctly from source systems to analytical models. Validatar helps you automatically validate that your data transformations work as expected—without writing mountains of test scripts.

This guide will help you get immediate value from Validatar by focusing on the testing patterns most relevant to your role:

  • Source-to-Target Validation - Ensure data moves correctly between staging and your warehouse
  • Row Count and Aggregate Checks - Quick wins to validate loads completed successfully
  • Schema Drift Detection - Catch unexpected metadata changes before they break downstream processes
  • Template-Based Testing - Scale your test coverage across hundreds of tables efficiently
What You'll Accomplish

By the end of this guide, you'll have:

  • Connected your data warehouse to Validatar
  • Created your first template test covering multiple tables
  • Set up automated row count validation for your key fact tables
  • Understood how to detect schema changes automatically

Prerequisites

Before starting, ensure you have:

  1. Access to Validatar - Either Cloud or Server edition (log in here)
  2. A Project created - Tests live within projects (create a project)
  3. At least one Data Source configured - Your data warehouse connection (create a data source)
  4. Schema metadata ingested - Validatar needs to know your tables and columns (ingest metadata)

Part 1: Understanding Key Features for Data Architects

Template Tests - Scale Your Testing

The biggest time-saver for Data Architects is Template Testing. Instead of writing individual tests for each table, you define a test pattern once and apply it across all matching objects.

Example Use Cases:

  • Validate all fact tables have non-null surrogate keys
  • Check all staging tables have matching row counts in target tables
  • Ensure all date columns fall within expected ranges

Template tests use metadata references to dynamically generate child tests. When you add new tables to your warehouse, Validatar can automatically create tests for them.

Source-to-Target Testing

One of the most common validation needs is ensuring data flows correctly from source to target. Validatar's comparison testing makes this straightforward:

Test Type Use Case
Row Count Match Verify staging table count matches target after ETL
Aggregate Comparison Compare SUM, AVG, MIN, MAX between source and target
Full Data Comparison Row-by-row validation for critical tables
Key-Based Comparison Match records by business key and compare specific columns

Part 2: Quick Wins - Your First Tests

Let's create tests that deliver immediate value. We'll start with simple validations and progress to template-based testing.

Quick Win #1: Row Count Validation

Row count checks are the fastest way to catch ETL failures. Here's how to create one:

  1. Navigate to your project's Test Repository
  2. Click New Test > New Standard Test
  3. Configure the test:
    • Name: Fact_Sales - Row Count Check
    • Test Data Set: Select your fact table
    • Data Retrieval: Choose "Row Count"
  4. Set the Result Criteria:
    • Compare to Defined Rule
    • Condition: Row Count > 0
  5. Save and run the test
Pro Tip

For ongoing validation, compare today's row count against yesterday's using "Compare to Previous Runs" with an acceptable variance threshold (e.g., +/- 10%).

Quick Win #2: Source-to-Target Count Match

Validate that your ETL loaded all records:

  1. Create a new Standard Test
  2. Test Data Set:
    • Data Source: Staging_DB
    • Query: SELECT COUNT(*) as row_count FROM staging.customers
  3. Compare To: Another Data Set
  4. Comparison Data Set:
    • Data Source: Data_Warehouse
    • Query: SELECT COUNT(*) as row_count FROM dw.dim_customer
  5. Result Criteria: Data sets should match exactly

Quick Win #3: Null Check on Required Columns

Ensure critical columns aren't null:

  1. Create a new Standard Test
  2. Test Data Set:
    SELECT COUNT(*) as null_count
    FROM dw.dim_customer
    WHERE customer_key IS NULL
    
  3. Result Criteria: null_count = 0

Part 3: Template Testing - Scale Your Coverage

Now let's leverage templates to create tests across multiple tables at once.

Creating a Row Count Template

This template will create row count validation tests for all your fact tables:

  1. Navigate to Test Repository > New Test > New Template Test
  2. Template Configuration:
    • Name: Fact Table Row Count Validation
    • Target Object Type: Table
  3. Object Selection: Choose "Specific Metadata Objects"
    • Filter tables where name starts with Fact_ or matches your naming convention
  4. Test Data Set:
    SELECT COUNT(*) as row_count
    FROM {{schema.name}}.{{table.name}}
    
  5. Result Criteria: row_count > 0
  6. Materialize the child tests

Validatar will create individual tests for each matching fact table. When you add new fact tables, simply re-materialize to include them.

Creating a Primary Key Uniqueness Template

Ensure all your dimension tables have unique surrogate keys:

  1. Create a new Template Test
  2. Name: Dimension PK Uniqueness
  3. Target Object Type: Table (filter to Dim_* tables)
  4. Test Data Set:
    SELECT {{table.cf.primary_key_column}}, COUNT(*) as duplicates
    FROM {{schema.name}}.{{table.name}}
    GROUP BY {{table.cf.primary_key_column}}
    HAVING COUNT(*) > 1
    
  5. Result Criteria: Row count = 0 (no duplicates)
Metadata Reference Syntax

Template tests use {{placeholder}} syntax for metadata references. Ensure your metadata has been properly ingested so these references resolve correctly.

Part 4: Schema Drift Detection

Data Architects need to know when source schemas change unexpectedly. Validatar's metadata profiling helps you detect:

  • New columns added to source tables
  • Columns dropped or renamed
  • Data type changes
  • New tables appearing in schemas

Setting Up Schema Monitoring

  1. Navigate to Settings > Data Sources > Your Source System
  2. Go to the Metadata tab
  3. Configure Scheduled Refresh:
    • Set a daily or weekly schedule
    • Enable change tracking
  4. Review changes in the Catalog Explorer

You can also create tests that explicitly validate schema expectations:

-- Test: Customer table should have exactly 15 columns
SELECT COUNT(*) as column_count
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'dim_customer'

With result criteria: column_count = 15

Part 5: Organizing Your Tests

As your test suite grows, organization becomes critical.

Recommended Folder Structure

Project: Data Warehouse Validation
├── Source-to-Target Tests/
│   ├── Staging to DW/
│   └── Source System to Staging/
├── Data Quality Tests/
│   ├── Null Checks/
│   ├── Uniqueness/
│   └── Referential Integrity/
├── Row Count Validations/
│   ├── Fact Tables/
│   └── Dimension Tables/
└── Schema Tests/
    └── Structure Validation/

Part 6: Automating Test Execution

Creating a Job

Jobs group tests for scheduled execution:

  1. Navigate to Job Repository > New Job
  2. Add your tests to the job
  3. Configure execution settings:
    • Parallel execution for faster runs
  4. Set up a schedule aligned with your ETL runs

Integration with Data Pipelines

For pipeline integration, use the Validatar API to:

  • Trigger test execution after ETL completes
  • Poll for results and fail pipelines on test failures
  • Include data quality metrics in your observability platform

Next Steps

You've now established a solid foundation for automated data validation. Here's where to go next:

Goal Resource
Explore all test configuration options Configuring a Test
Learn about data profiling Data Profiles
Set up notifications for test failures Turning On Notifications
Build custom reports Creating Reports
Understand the test lifecycle Test Automation Lifecycle

Summary

As a Data Architect, Validatar helps you:

  • Save time with template-based testing across hundreds of objects
  • Catch issues early with source-to-target validation
  • Prevent surprises with automated schema drift detection
  • Scale confidently knowing your data pipelines are validated

Start with the quick wins to prove immediate value, then progressively expand your test coverage using templates. Your future self (and your stakeholders) will thank you.

Need Help?

Check out our FAQs or contact support if you have questions about implementing these testing patterns.