Overview
Validatar stores column and table profiling results in two tables within the RESULTS schema. These tables contain the raw profile data that powers the PROFILE_* columns in VW_DATASET_TABLES and VW_DATASET_COLUMNS. Querying them directly gives you access to historical profiling trends and the full set of profile definitions — including values not surfaced in the repository views.
| Table | Purpose |
|---|---|
| META_DATA_LATEST_PROFILES | Current (most recent) profile value for each metric per column/table |
| META_DATA_PROFILE_SET_EXECUTION_RESULTS | Historical record of every profiling execution result |
META_DATA_LATEST_PROFILES
Stores the latest profile value for each combination of data source, table, column, and profile definition. This is the backing data for the PROFILE_* columns in the repository views.
| Column | Type | Nullable | Description |
|---|---|---|---|
| ID | NUMBER | No | Unique row identifier |
| META_DATA_SOURCE_ID | NUMBER | No | Data source ID |
| META_DATA_PROFILE_SET_EXECUTION_ID | NUMBER | No | ID of the profiling execution that produced this value |
| DATA_PROFILE_DEF_ID | NUMBER | No | Profile definition ID — identifies which metric this is |
| META_SCHEMA_ID | NUMBER | No | Schema ID |
| META_TABLE_ID | NUMBER | No | Table ID |
| META_COLUMN_ID | NUMBER | Yes | Column ID (NULL for table-level profiles) |
| VALUE | VARCHAR | Yes | The profile value (stored as string regardless of actual type) |
| DATE_CREATED | TIMESTAMP | No | When this profile result was created |
| DATE_MODIFIED | TIMESTAMP | No | When this row was last updated |
| SF_DATE_CREATED | TIMESTAMP | No | When loaded into Snowflake |
| SF_IMPORT_FILENAME | VARCHAR | Yes | Staging file name |
META_DATA_PROFILE_SET_EXECUTION_RESULTS
Stores every historical profiling result — one row per metric per execution. Unlike LATEST_PROFILES (which only keeps the most recent value), this table retains the full history for trend analysis.
| Column | Type | Nullable | Description |
|---|---|---|---|
| ID | NUMBER | No | Unique row identifier |
| META_DATA_SOURCE_ID | NUMBER | No | Data source ID |
| META_DATA_PROFILE_SET_EXECUTION_ID | NUMBER | No | ID of the profiling execution |
| META_DATA_PROFILE_SET_EXECUTION_STEP_ID | NUMBER | No | Specific step within the profiling execution |
| DATA_PROFILE_DEF_ID | NUMBER | No | Profile definition ID |
| META_SCHEMA_ID | NUMBER | No | Schema ID |
| META_TABLE_ID | NUMBER | No | Table ID |
| META_COLUMN_ID | NUMBER | Yes | Column ID (NULL for table-level profiles) |
| GROUPING_KEY | VARCHAR | Yes | Grouping key for partitioned profiles |
| VALUE | VARCHAR | Yes | The profile value |
| DATE_CREATED | TIMESTAMP | No | When this result was calculated |
| SF_DATE_CREATED | TIMESTAMP | No | When loaded into Snowflake |
| SF_IMPORT_FILENAME | VARCHAR | Yes | Staging file name |
Profile Definition ID Reference
The DATA_PROFILE_DEF_ID column identifies which metric a row represents. Here are the common profile definitions:
Table-Level Profiles
| DEF_ID | Metric | Type | Description |
|---|---|---|---|
| 1 | Record Count | Numeric | Total row count |
| 2 | Total Data MB | Numeric | Table size in megabytes |
| 43 | Column Count | Numeric | Number of columns |
| 47 | Last Refresh Date | Timestamp | When profiling last ran |
| 48 | Concatenated Row Text Calc | String | Concatenated row checksum |
| 57 | Sum Checksum Table | Numeric | Sum-based table checksum |
Column-Level Profiles
| DEF_ID | Metric | Type | Description |
|---|---|---|---|
| 3 | Distinct Count | Numeric | Number of distinct values |
| 4 | Distinct Percent | Decimal | Distinct as fraction of total (multiply by 100 for percentage) |
| 5 | Most Common Value | String | Most frequently occurring value |
| 6 | Most Common Count | Numeric | Frequency of the most common value |
| 7 | Min Numeric | Numeric | Minimum numeric value |
| 8 | Max Numeric | Numeric | Maximum numeric value |
| 9 | Mean Numeric | Numeric | Average numeric value |
| 10 | Median Numeric | Numeric | Median numeric value |
| 12 | Min String | String | Minimum string value |
| 13 | Max String | String | Maximum string value |
| 14 | Standard Deviation | Numeric | Standard deviation |
| 15 | Max Length | Numeric | Longest string length |
| 16 | Min Length | Numeric | Shortest string length |
| 17 | Mean Length | Numeric | Average string length |
| 20 | Null Count | Numeric | Number of NULLs |
| 21 | Null Percent | Decimal | NULL fraction (multiply by 100 for percentage) |
| 22 | Blank Count | Numeric | Number of blank/empty strings |
| 23 | Blank Percent | Decimal | Blank fraction |
| 24 | Numeric Count | Numeric | Values that parse as numeric |
| 25 | Numeric Percent | Decimal | Numeric fraction |
| 26 | Zero Count | Numeric | Number of zeros |
| 27 | Zero Percent | Decimal | Zero fraction |
| 28 | Negative Count | Numeric | Number of negative values |
| 29 | Negative Percent | Decimal | Negative fraction |
| 30 | Min Date | Timestamp | Earliest date value |
| 31 | Max Date | Timestamp | Latest date value |
| 32 | Longest Value | String | The actual longest value |
| 33 | Shortest Value | String | The actual shortest value |
| 41 | Lower Quartile | Numeric | 25th percentile |
| 42 | Upper Quartile | Numeric | 75th percentile |
| 54 | Std Dev (Large) | Numeric | Population standard deviation |
| 56 | Sum Checksum Column | Numeric | Sum checksum |
| 58 | Sum Distinct Checksum | Numeric | Distinct value checksum |
How Repository Views Surface Profile Data
The VW_DATASET_TABLES and VW_DATASET_COLUMNS views join to internal helper views (VW_DATASET_HELPER_TABLE_PROFILES, VW_DATASET_HELPER_COLUMN_PROFILES) which pivot META_DATA_LATEST_PROFILES by DATA_PROFILE_DEF_ID into named columns. The helper views filter by profile_def_id and cast the string VALUE to the appropriate type.
For example, PROFILE_RECORD_COUNT in VW_DATASET_TABLES comes from:
CAST(p.value AS DECIMAL(38, 8))
-- where p.profile_def_id = 1
And PROFILE_NULL_PERCENT in VW_DATASET_COLUMNS comes from:
CAST(p.value AS DECIMAL(16, 4)) * 100
-- where p.profile_def_id = 21
Relationships
META_DATA_LATEST_PROFILES
│
├── META_TABLE_ID → REPOSITORY.VW_DATASET_TABLES.TABLE_ID
├── META_COLUMN_ID → REPOSITORY.VW_DATASET_COLUMNS.COLUMN_ID
└── META_DATA_SOURCE_ID → REPOSITORY.VW_DATASET_TABLES.DATA_SOURCE_ID
META_DATA_PROFILE_SET_EXECUTION_RESULTS
│
├── Same join keys as META_DATA_LATEST_PROFILES
└── META_DATA_PROFILE_SET_EXECUTION_ID links to execution history
Example Queries
Get Latest Profile Values for a Specific Table
SELECT
p.DATA_PROFILE_DEF_ID,
CASE p.DATA_PROFILE_DEF_ID
WHEN 1 THEN 'Record Count'
WHEN 2 THEN 'Total Data MB'
WHEN 43 THEN 'Column Count'
WHEN 47 THEN 'Last Refresh Date'
END AS METRIC_NAME,
p.VALUE,
p.DATE_MODIFIED
FROM RESULTS.META_DATA_LATEST_PROFILES p
WHERE p.META_TABLE_ID = 8482
AND p.META_COLUMN_ID IS NULL -- table-level only
ORDER BY p.DATA_PROFILE_DEF_ID;
Historical Profile Trends for a Column
SELECT
h.DATE_CREATED,
h.DATA_PROFILE_DEF_ID,
CASE h.DATA_PROFILE_DEF_ID
WHEN 3 THEN 'Distinct Count'
WHEN 20 THEN 'Null Count'
WHEN 7 THEN 'Min Numeric'
WHEN 8 THEN 'Max Numeric'
END AS METRIC,
h.VALUE
FROM RESULTS.META_DATA_PROFILE_SET_EXECUTION_RESULTS h
WHERE h.META_TABLE_ID = 8482
AND h.META_COLUMN_ID = 101530
AND h.DATA_PROFILE_DEF_ID IN (3, 20, 7, 8)
ORDER BY h.DATE_CREATED, h.DATA_PROFILE_DEF_ID;
Row Count Trend for a Table Over Time
SELECT
h.DATE_CREATED::DATE AS PROFILE_DATE,
CAST(h.VALUE AS DECIMAL(38,0)) AS RECORD_COUNT
FROM RESULTS.META_DATA_PROFILE_SET_EXECUTION_RESULTS h
WHERE h.META_TABLE_ID = 8482
AND h.META_COLUMN_ID IS NULL
AND h.DATA_PROFILE_DEF_ID = 1 -- Record Count
ORDER BY h.DATE_CREATED;
Compare Profiles Across Tables in a Schema
SELECT
t.TABLE_NAME,
MAX(CASE WHEN p.DATA_PROFILE_DEF_ID = 1 THEN CAST(p.VALUE AS DECIMAL(38,0)) END) AS RECORD_COUNT,
MAX(CASE WHEN p.DATA_PROFILE_DEF_ID = 2 THEN CAST(p.VALUE AS DECIMAL(38,8)) END) AS TOTAL_DATA_MB,
MAX(CASE WHEN p.DATA_PROFILE_DEF_ID = 43 THEN CAST(p.VALUE AS DECIMAL(38,0)) END) AS COLUMN_COUNT
FROM RESULTS.META_DATA_LATEST_PROFILES p
JOIN REPOSITORY.VW_DATASET_TABLES t
ON t.TABLE_ID = p.META_TABLE_ID
WHERE p.META_COLUMN_ID IS NULL
AND t.SCHEMA_NAME = 'PUBLIC'
AND p.DATA_PROFILE_DEF_ID IN (1, 2, 43)
GROUP BY t.TABLE_NAME
ORDER BY RECORD_COUNT DESC NULLS LAST;
Databricks Notes
- Replace
DATE_CREATED::DATEwithCAST(DATE_CREATED AS DATE) CAST(VALUE AS DECIMAL(...))syntax is identical- The
SF_DATE_CREATEDandSF_IMPORT_FILENAMEcolumns are Snowflake-specific; Databricks uses its own internal loading metadata - Percentage values (DEF_IDs 4, 21, 23, 25, 27, 29) are stored as fractions (e.g., 0.5 = 50%) — multiply by 100 for display