Documentation Index

Fetch the complete documentation index at: https://docs.validatar.com/llms.txt

Use this file to discover all available pages before exploring further.

META_DATA Profiling Tables

Prev Next

Overview

Validatar stores column and table profiling results in two tables within the RESULTS schema. These tables contain the raw profile data that powers the PROFILE_* columns in VW_DATASET_TABLES and VW_DATASET_COLUMNS. Querying them directly gives you access to historical profiling trends and the full set of profile definitions — including values not surfaced in the repository views.

Table Purpose
META_DATA_LATEST_PROFILES Current (most recent) profile value for each metric per column/table
META_DATA_PROFILE_SET_EXECUTION_RESULTS Historical record of every profiling execution result

META_DATA_LATEST_PROFILES

Stores the latest profile value for each combination of data source, table, column, and profile definition. This is the backing data for the PROFILE_* columns in the repository views.

Column Type Nullable Description
ID NUMBER No Unique row identifier
META_DATA_SOURCE_ID NUMBER No Data source ID
META_DATA_PROFILE_SET_EXECUTION_ID NUMBER No ID of the profiling execution that produced this value
DATA_PROFILE_DEF_ID NUMBER No Profile definition ID — identifies which metric this is
META_SCHEMA_ID NUMBER No Schema ID
META_TABLE_ID NUMBER No Table ID
META_COLUMN_ID NUMBER Yes Column ID (NULL for table-level profiles)
VALUE VARCHAR Yes The profile value (stored as string regardless of actual type)
DATE_CREATED TIMESTAMP No When this profile result was created
DATE_MODIFIED TIMESTAMP No When this row was last updated
SF_DATE_CREATED TIMESTAMP No When loaded into Snowflake
SF_IMPORT_FILENAME VARCHAR Yes Staging file name

META_DATA_PROFILE_SET_EXECUTION_RESULTS

Stores every historical profiling result — one row per metric per execution. Unlike LATEST_PROFILES (which only keeps the most recent value), this table retains the full history for trend analysis.

Column Type Nullable Description
ID NUMBER No Unique row identifier
META_DATA_SOURCE_ID NUMBER No Data source ID
META_DATA_PROFILE_SET_EXECUTION_ID NUMBER No ID of the profiling execution
META_DATA_PROFILE_SET_EXECUTION_STEP_ID NUMBER No Specific step within the profiling execution
DATA_PROFILE_DEF_ID NUMBER No Profile definition ID
META_SCHEMA_ID NUMBER No Schema ID
META_TABLE_ID NUMBER No Table ID
META_COLUMN_ID NUMBER Yes Column ID (NULL for table-level profiles)
GROUPING_KEY VARCHAR Yes Grouping key for partitioned profiles
VALUE VARCHAR Yes The profile value
DATE_CREATED TIMESTAMP No When this result was calculated
SF_DATE_CREATED TIMESTAMP No When loaded into Snowflake
SF_IMPORT_FILENAME VARCHAR Yes Staging file name

Profile Definition ID Reference

The DATA_PROFILE_DEF_ID column identifies which metric a row represents. Here are the common profile definitions:

Table-Level Profiles

DEF_ID Metric Type Description
1 Record Count Numeric Total row count
2 Total Data MB Numeric Table size in megabytes
43 Column Count Numeric Number of columns
47 Last Refresh Date Timestamp When profiling last ran
48 Concatenated Row Text Calc String Concatenated row checksum
57 Sum Checksum Table Numeric Sum-based table checksum

Column-Level Profiles

DEF_ID Metric Type Description
3 Distinct Count Numeric Number of distinct values
4 Distinct Percent Decimal Distinct as fraction of total (multiply by 100 for percentage)
5 Most Common Value String Most frequently occurring value
6 Most Common Count Numeric Frequency of the most common value
7 Min Numeric Numeric Minimum numeric value
8 Max Numeric Numeric Maximum numeric value
9 Mean Numeric Numeric Average numeric value
10 Median Numeric Numeric Median numeric value
12 Min String String Minimum string value
13 Max String String Maximum string value
14 Standard Deviation Numeric Standard deviation
15 Max Length Numeric Longest string length
16 Min Length Numeric Shortest string length
17 Mean Length Numeric Average string length
20 Null Count Numeric Number of NULLs
21 Null Percent Decimal NULL fraction (multiply by 100 for percentage)
22 Blank Count Numeric Number of blank/empty strings
23 Blank Percent Decimal Blank fraction
24 Numeric Count Numeric Values that parse as numeric
25 Numeric Percent Decimal Numeric fraction
26 Zero Count Numeric Number of zeros
27 Zero Percent Decimal Zero fraction
28 Negative Count Numeric Number of negative values
29 Negative Percent Decimal Negative fraction
30 Min Date Timestamp Earliest date value
31 Max Date Timestamp Latest date value
32 Longest Value String The actual longest value
33 Shortest Value String The actual shortest value
41 Lower Quartile Numeric 25th percentile
42 Upper Quartile Numeric 75th percentile
54 Std Dev (Large) Numeric Population standard deviation
56 Sum Checksum Column Numeric Sum checksum
58 Sum Distinct Checksum Numeric Distinct value checksum

How Repository Views Surface Profile Data

The VW_DATASET_TABLES and VW_DATASET_COLUMNS views join to internal helper views (VW_DATASET_HELPER_TABLE_PROFILES, VW_DATASET_HELPER_COLUMN_PROFILES) which pivot META_DATA_LATEST_PROFILES by DATA_PROFILE_DEF_ID into named columns. The helper views filter by profile_def_id and cast the string VALUE to the appropriate type.

For example, PROFILE_RECORD_COUNT in VW_DATASET_TABLES comes from:

CAST(p.value AS DECIMAL(38, 8))
-- where p.profile_def_id = 1

And PROFILE_NULL_PERCENT in VW_DATASET_COLUMNS comes from:

CAST(p.value AS DECIMAL(16, 4)) * 100
-- where p.profile_def_id = 21

Relationships

META_DATA_LATEST_PROFILES
    │
    ├── META_TABLE_ID → REPOSITORY.VW_DATASET_TABLES.TABLE_ID
    ├── META_COLUMN_ID → REPOSITORY.VW_DATASET_COLUMNS.COLUMN_ID
    └── META_DATA_SOURCE_ID → REPOSITORY.VW_DATASET_TABLES.DATA_SOURCE_ID

META_DATA_PROFILE_SET_EXECUTION_RESULTS
    │
    ├── Same join keys as META_DATA_LATEST_PROFILES
    └── META_DATA_PROFILE_SET_EXECUTION_ID links to execution history

Example Queries

Get Latest Profile Values for a Specific Table

SELECT
    p.DATA_PROFILE_DEF_ID,
    CASE p.DATA_PROFILE_DEF_ID
        WHEN 1 THEN 'Record Count'
        WHEN 2 THEN 'Total Data MB'
        WHEN 43 THEN 'Column Count'
        WHEN 47 THEN 'Last Refresh Date'
    END AS METRIC_NAME,
    p.VALUE,
    p.DATE_MODIFIED
FROM RESULTS.META_DATA_LATEST_PROFILES p
WHERE p.META_TABLE_ID = 8482
    AND p.META_COLUMN_ID IS NULL  -- table-level only
ORDER BY p.DATA_PROFILE_DEF_ID;

Historical Profile Trends for a Column

SELECT
    h.DATE_CREATED,
    h.DATA_PROFILE_DEF_ID,
    CASE h.DATA_PROFILE_DEF_ID
        WHEN 3 THEN 'Distinct Count'
        WHEN 20 THEN 'Null Count'
        WHEN 7 THEN 'Min Numeric'
        WHEN 8 THEN 'Max Numeric'
    END AS METRIC,
    h.VALUE
FROM RESULTS.META_DATA_PROFILE_SET_EXECUTION_RESULTS h
WHERE h.META_TABLE_ID = 8482
    AND h.META_COLUMN_ID = 101530
    AND h.DATA_PROFILE_DEF_ID IN (3, 20, 7, 8)
ORDER BY h.DATE_CREATED, h.DATA_PROFILE_DEF_ID;

Row Count Trend for a Table Over Time

SELECT
    h.DATE_CREATED::DATE AS PROFILE_DATE,
    CAST(h.VALUE AS DECIMAL(38,0)) AS RECORD_COUNT
FROM RESULTS.META_DATA_PROFILE_SET_EXECUTION_RESULTS h
WHERE h.META_TABLE_ID = 8482
    AND h.META_COLUMN_ID IS NULL
    AND h.DATA_PROFILE_DEF_ID = 1  -- Record Count
ORDER BY h.DATE_CREATED;

Compare Profiles Across Tables in a Schema

SELECT
    t.TABLE_NAME,
    MAX(CASE WHEN p.DATA_PROFILE_DEF_ID = 1 THEN CAST(p.VALUE AS DECIMAL(38,0)) END) AS RECORD_COUNT,
    MAX(CASE WHEN p.DATA_PROFILE_DEF_ID = 2 THEN CAST(p.VALUE AS DECIMAL(38,8)) END) AS TOTAL_DATA_MB,
    MAX(CASE WHEN p.DATA_PROFILE_DEF_ID = 43 THEN CAST(p.VALUE AS DECIMAL(38,0)) END) AS COLUMN_COUNT
FROM RESULTS.META_DATA_LATEST_PROFILES p
JOIN REPOSITORY.VW_DATASET_TABLES t
    ON t.TABLE_ID = p.META_TABLE_ID
WHERE p.META_COLUMN_ID IS NULL
    AND t.SCHEMA_NAME = 'PUBLIC'
    AND p.DATA_PROFILE_DEF_ID IN (1, 2, 43)
GROUP BY t.TABLE_NAME
ORDER BY RECORD_COUNT DESC NULLS LAST;

Databricks Notes

  • Replace DATE_CREATED::DATE with CAST(DATE_CREATED AS DATE)
  • CAST(VALUE AS DECIMAL(...)) syntax is identical
  • The SF_DATE_CREATED and SF_IMPORT_FILENAME columns are Snowflake-specific; Databricks uses its own internal loading metadata
  • Percentage values (DEF_IDs 4, 21, 23, 25, 27, 29) are stored as fractions (e.g., 0.5 = 50%) — multiply by 100 for display