Overview
The Windows Directory data source template enables Validatar to discover and profile files in local or network Windows directories. It uses Python scripts to scan directory structures, read file headers (CSV, Excel, Parquet), and extract schema information.
Platform: Windows file system (local and UNC paths)
Connection Category: Script
Template Category: Marketplace
What's Included
Default Parameters
| Parameter | Type | Description |
|---|---|---|
directory_path |
String | Root directory path (local or UNC) |
file_pattern |
String | Glob pattern for file matching (e.g., *.csv, *.xlsx) |
recursive |
Boolean | Whether to scan subdirectories |
encoding |
Dropdown | File encoding (UTF-8, Latin-1, etc.) |
Data Type Mappings
Maps inferred column types from file headers:
string,object→ Stringint64,integer→ Integerfloat64,decimal→ Decimaldatetime64,datetime→ DateTimebool→ Boolean
Metadata Ingestion
The ingestion script:
- Scans the directory for files matching the pattern
- Each file becomes a "table" in the metadata catalog
- Reads file headers to discover column names and infer data types
- Subdirectories can optionally be treated as schemas
Profiling
The profiling script reads files and calculates:
- Record count per file
- Null count and null percentage per column
- Distinct count per column
- Min/max for numeric and date columns
Installation
Customization
- File types — Extend the ingestion script to support additional formats (JSON, XML, fixed-width)
- Sampling — For large files, configure row sampling instead of full reads
- Network paths — Ensure the Validatar service account has read access to UNC paths