Azure Blob Storage Data Source Template

Prev Next

Overview

The Azure Blob Storage data source template enables Validatar to discover and profile data files stored in Azure Blob containers and Azure Data Lake Storage Gen2. It uses Python scripts with the azure-storage-blob library to list blobs, read file schemas, and calculate data quality metrics.

Platform: Azure Blob Storage / ADLS Gen2
Connection Category: Script
Template Category: Marketplace

What's Included

Default Parameters

Parameter Type Description
account_name String Storage account name
container_name String Blob container name
prefix String Blob prefix to scope discovery
connection_string Secret Azure Storage connection string
file_format Dropdown Expected file format (CSV, Parquet, JSON)

Data Type Mappings

Maps inferred types from file schemas (similar to the AWS S3 template).

Metadata Ingestion

The ingestion script:

  • Lists blobs in the container matching the prefix and format
  • Blob prefixes (virtual directories) become schemas
  • Each blob or blob group becomes a table
  • Reads file headers/schemas to discover columns

Profiling

The profiling script downloads sample data and calculates standard metrics.

Installation

Customization

  • Managed identity — Modify the script to use Azure Managed Identity instead of connection strings
  • ADLS Gen2 hierarchical namespace — Adjust for directory-aware storage
  • Delta Lake — Extend to read Delta Lake tables stored in ADLS

Related Articles