Overview
The Azure Blob Storage data source template enables Validatar to discover and profile data files stored in Azure Blob containers and Azure Data Lake Storage Gen2. It uses Python scripts with the azure-storage-blob library to list blobs, read file schemas, and calculate data quality metrics.
Platform: Azure Blob Storage / ADLS Gen2
Connection Category: Script
Template Category: Marketplace
What's Included
Default Parameters
| Parameter | Type | Description |
|---|---|---|
account_name |
String | Storage account name |
container_name |
String | Blob container name |
prefix |
String | Blob prefix to scope discovery |
connection_string |
Secret | Azure Storage connection string |
file_format |
Dropdown | Expected file format (CSV, Parquet, JSON) |
Data Type Mappings
Maps inferred types from file schemas (similar to the AWS S3 template).
Metadata Ingestion
The ingestion script:
- Lists blobs in the container matching the prefix and format
- Blob prefixes (virtual directories) become schemas
- Each blob or blob group becomes a table
- Reads file headers/schemas to discover columns
Profiling
The profiling script downloads sample data and calculates standard metrics.
Installation
Customization
- Managed identity — Modify the script to use Azure Managed Identity instead of connection strings
- ADLS Gen2 hierarchical namespace — Adjust for directory-aware storage
- Delta Lake — Extend to read Delta Lake tables stored in ADLS