Windows Directory Data Source Template

Prev Next

Overview

The Windows Directory data source template enables Validatar to discover and profile files in local or network Windows directories. It uses Python scripts to scan directory structures, read file headers (CSV, Excel, Parquet), and extract schema information.

Platform: Windows file system (local and UNC paths)
Connection Category: Script
Template Category: Marketplace

What's Included

Default Parameters

Parameter Type Description
directory_path String Root directory path (local or UNC)
file_pattern String Glob pattern for file matching (e.g., *.csv, *.xlsx)
recursive Boolean Whether to scan subdirectories
encoding Dropdown File encoding (UTF-8, Latin-1, etc.)

Data Type Mappings

Maps inferred column types from file headers:

  • string, object → String
  • int64, integer → Integer
  • float64, decimal → Decimal
  • datetime64, datetime → DateTime
  • bool → Boolean

Metadata Ingestion

The ingestion script:

  • Scans the directory for files matching the pattern
  • Each file becomes a "table" in the metadata catalog
  • Reads file headers to discover column names and infer data types
  • Subdirectories can optionally be treated as schemas

Profiling

The profiling script reads files and calculates:

  • Record count per file
  • Null count and null percentage per column
  • Distinct count per column
  • Min/max for numeric and date columns

Installation

Customization

  • File types — Extend the ingestion script to support additional formats (JSON, XML, fixed-width)
  • Sampling — For large files, configure row sampling instead of full reads
  • Network paths — Ensure the Validatar service account has read access to UNC paths

Related Articles