META_DATA Profiling Tables

Overview

Validatar stores column and table profiling results in two tables within the RESULTS schema. These tables contain the raw profile data that powers the PROFILE_* columns in VW_DATASET_TABLES and VW_DATASET_COLUMNS. Querying them directly gives you access to historical profiling trends and the full set of profile definitions — including values not surfaced in the repository views.

Table	Purpose
META_DATA_LATEST_PROFILES	Current (most recent) profile value for each metric per column/table
META_DATA_PROFILE_SET_EXECUTION_RESULTS	Historical record of every profiling execution result

META_DATA_LATEST_PROFILES

Stores the latest profile value for each combination of data source, table, column, and profile definition. This is the backing data for the PROFILE_* columns in the repository views.

Column	Type	Nullable	Description
ID	NUMBER	No	Unique row identifier
META_DATA_SOURCE_ID	NUMBER	No	Data source ID
META_DATA_PROFILE_SET_EXECUTION_ID	NUMBER	No	ID of the profiling execution that produced this value
DATA_PROFILE_DEF_ID	NUMBER	No	Profile definition ID — identifies which metric this is
META_SCHEMA_ID	NUMBER	No	Schema ID
META_TABLE_ID	NUMBER	No	Table ID
META_COLUMN_ID	NUMBER	Yes	Column ID (NULL for table-level profiles)
VALUE	VARCHAR	Yes	The profile value (stored as string regardless of actual type)
DATE_CREATED	TIMESTAMP	No	When this profile result was created
DATE_MODIFIED	TIMESTAMP	No	When this row was last updated
SF_DATE_CREATED	TIMESTAMP	No	When loaded into Snowflake
SF_IMPORT_FILENAME	VARCHAR	Yes	Staging file name

META_DATA_PROFILE_SET_EXECUTION_RESULTS

Stores every historical profiling result — one row per metric per execution. Unlike LATEST_PROFILES (which only keeps the most recent value), this table retains the full history for trend analysis.

Column	Type	Nullable	Description
ID	NUMBER	No	Unique row identifier
META_DATA_SOURCE_ID	NUMBER	No	Data source ID
META_DATA_PROFILE_SET_EXECUTION_ID	NUMBER	No	ID of the profiling execution
META_DATA_PROFILE_SET_EXECUTION_STEP_ID	NUMBER	No	Specific step within the profiling execution
DATA_PROFILE_DEF_ID	NUMBER	No	Profile definition ID
META_SCHEMA_ID	NUMBER	No	Schema ID
META_TABLE_ID	NUMBER	No	Table ID
META_COLUMN_ID	NUMBER	Yes	Column ID (NULL for table-level profiles)
GROUPING_KEY	VARCHAR	Yes	Grouping key for partitioned profiles
VALUE	VARCHAR	Yes	The profile value
DATE_CREATED	TIMESTAMP	No	When this result was calculated
SF_DATE_CREATED	TIMESTAMP	No	When loaded into Snowflake
SF_IMPORT_FILENAME	VARCHAR	Yes	Staging file name

Profile Definition ID Reference

The DATA_PROFILE_DEF_ID column identifies which metric a row represents. Here are the common profile definitions:

Table-Level Profiles

DEF_ID	Metric	Type	Description
1	Record Count	Numeric	Total row count
2	Total Data MB	Numeric	Table size in megabytes
43	Column Count	Numeric	Number of columns
47	Last Refresh Date	Timestamp	When profiling last ran
48	Concatenated Row Text Calc	String	Concatenated row checksum
57	Sum Checksum Table	Numeric	Sum-based table checksum

Column-Level Profiles

DEF_ID	Metric	Type	Description
3	Distinct Count	Numeric	Number of distinct values
4	Distinct Percent	Decimal	Distinct as fraction of total (multiply by 100 for percentage)
5	Most Common Value	String	Most frequently occurring value
6	Most Common Count	Numeric	Frequency of the most common value
7	Min Numeric	Numeric	Minimum numeric value
8	Max Numeric	Numeric	Maximum numeric value
9	Mean Numeric	Numeric	Average numeric value
10	Median Numeric	Numeric	Median numeric value
12	Min String	String	Minimum string value
13	Max String	String	Maximum string value
14	Standard Deviation	Numeric	Standard deviation
15	Max Length	Numeric	Longest string length
16	Min Length	Numeric	Shortest string length
17	Mean Length	Numeric	Average string length
20	Null Count	Numeric	Number of NULLs
21	Null Percent	Decimal	NULL fraction (multiply by 100 for percentage)
22	Blank Count	Numeric	Number of blank/empty strings
23	Blank Percent	Decimal	Blank fraction
24	Numeric Count	Numeric	Values that parse as numeric
25	Numeric Percent	Decimal	Numeric fraction
26	Zero Count	Numeric	Number of zeros
27	Zero Percent	Decimal	Zero fraction
28	Negative Count	Numeric	Number of negative values
29	Negative Percent	Decimal	Negative fraction
30	Min Date	Timestamp	Earliest date value
31	Max Date	Timestamp	Latest date value
32	Longest Value	String	The actual longest value
33	Shortest Value	String	The actual shortest value
41	Lower Quartile	Numeric	25th percentile
42	Upper Quartile	Numeric	75th percentile
54	Std Dev (Large)	Numeric	Population standard deviation
56	Sum Checksum Column	Numeric	Sum checksum
58	Sum Distinct Checksum	Numeric	Distinct value checksum

How Repository Views Surface Profile Data

The VW_DATASET_TABLES and VW_DATASET_COLUMNS views join to internal helper views (VW_DATASET_HELPER_TABLE_PROFILES, VW_DATASET_HELPER_COLUMN_PROFILES) which pivot META_DATA_LATEST_PROFILES by DATA_PROFILE_DEF_ID into named columns. The helper views filter by profile_def_id and cast the string VALUE to the appropriate type.

For example, PROFILE_RECORD_COUNT in VW_DATASET_TABLES comes from:

CAST(p.value AS DECIMAL(38, 8))
-- where p.profile_def_id = 1

And PROFILE_NULL_PERCENT in VW_DATASET_COLUMNS comes from:

CAST(p.value AS DECIMAL(16, 4)) * 100
-- where p.profile_def_id = 21

Relationships

META_DATA_LATEST_PROFILES
    │
    ├── META_TABLE_ID → REPOSITORY.VW_DATASET_TABLES.TABLE_ID
    ├── META_COLUMN_ID → REPOSITORY.VW_DATASET_COLUMNS.COLUMN_ID
    └── META_DATA_SOURCE_ID → REPOSITORY.VW_DATASET_TABLES.DATA_SOURCE_ID

META_DATA_PROFILE_SET_EXECUTION_RESULTS
    │
    ├── Same join keys as META_DATA_LATEST_PROFILES
    └── META_DATA_PROFILE_SET_EXECUTION_ID links to execution history

Example Queries

Get Latest Profile Values for a Specific Table

SELECT
    p.DATA_PROFILE_DEF_ID,
    CASE p.DATA_PROFILE_DEF_ID
        WHEN 1 THEN 'Record Count'
        WHEN 2 THEN 'Total Data MB'
        WHEN 43 THEN 'Column Count'
        WHEN 47 THEN 'Last Refresh Date'
    END AS METRIC_NAME,
    p.VALUE,
    p.DATE_MODIFIED
FROM RESULTS.META_DATA_LATEST_PROFILES p
WHERE p.META_TABLE_ID = 8482
    AND p.META_COLUMN_ID IS NULL  -- table-level only
ORDER BY p.DATA_PROFILE_DEF_ID;

Historical Profile Trends for a Column

SELECT
    h.DATE_CREATED,
    h.DATA_PROFILE_DEF_ID,
    CASE h.DATA_PROFILE_DEF_ID
        WHEN 3 THEN 'Distinct Count'
        WHEN 20 THEN 'Null Count'
        WHEN 7 THEN 'Min Numeric'
        WHEN 8 THEN 'Max Numeric'
    END AS METRIC,
    h.VALUE
FROM RESULTS.META_DATA_PROFILE_SET_EXECUTION_RESULTS h
WHERE h.META_TABLE_ID = 8482
    AND h.META_COLUMN_ID = 101530
    AND h.DATA_PROFILE_DEF_ID IN (3, 20, 7, 8)
ORDER BY h.DATE_CREATED, h.DATA_PROFILE_DEF_ID;

Row Count Trend for a Table Over Time

SELECT
    h.DATE_CREATED::DATE AS PROFILE_DATE,
    CAST(h.VALUE AS DECIMAL(38,0)) AS RECORD_COUNT
FROM RESULTS.META_DATA_PROFILE_SET_EXECUTION_RESULTS h
WHERE h.META_TABLE_ID = 8482
    AND h.META_COLUMN_ID IS NULL
    AND h.DATA_PROFILE_DEF_ID = 1  -- Record Count
ORDER BY h.DATE_CREATED;

Compare Profiles Across Tables in a Schema

SELECT
    t.TABLE_NAME,
    MAX(CASE WHEN p.DATA_PROFILE_DEF_ID = 1 THEN CAST(p.VALUE AS DECIMAL(38,0)) END) AS RECORD_COUNT,
    MAX(CASE WHEN p.DATA_PROFILE_DEF_ID = 2 THEN CAST(p.VALUE AS DECIMAL(38,8)) END) AS TOTAL_DATA_MB,
    MAX(CASE WHEN p.DATA_PROFILE_DEF_ID = 43 THEN CAST(p.VALUE AS DECIMAL(38,0)) END) AS COLUMN_COUNT
FROM RESULTS.META_DATA_LATEST_PROFILES p
JOIN REPOSITORY.VW_DATASET_TABLES t
    ON t.TABLE_ID = p.META_TABLE_ID
WHERE p.META_COLUMN_ID IS NULL
    AND t.SCHEMA_NAME = 'PUBLIC'
    AND p.DATA_PROFILE_DEF_ID IN (1, 2, 43)
GROUP BY t.TABLE_NAME
ORDER BY RECORD_COUNT DESC NULLS LAST;

Databricks Notes

Replace DATE_CREATED::DATE with CAST(DATE_CREATED AS DATE)
CAST(VALUE AS DECIMAL(...)) syntax is identical
The SF_DATE_CREATED and SF_IMPORT_FILENAME columns are Snowflake-specific; Databricks uses its own internal loading metadata
Percentage values (DEF_IDs 4, 21, 23, 25, 27, 29) are stored as fractions (e.g., 0.5 = 50%) — multiply by 100 for display

Documentation Index