Overview
The REST API data source template enables Validatar to discover and profile data exposed through REST APIs that publish Swagger/OpenAPI specifications. It uses Python scripts to read the API specification, map endpoints to tables and response fields to columns, and sample data for profiling.
Platform: Any REST API with Swagger/OpenAPI spec
Connection Category: Script
Template Category: Marketplace
What's Included
Default Parameters
| Parameter | Type | Description |
|---|---|---|
api_base_url |
String | Base URL of the API |
swagger_url |
String | URL of the Swagger/OpenAPI spec (e.g., /swagger/v1/swagger.json) |
api_key |
Secret | API authentication key or token |
auth_type |
Dropdown | Authentication type (Bearer, API Key Header, Basic) |
auth_header_name |
String | Custom header name for API key authentication |
page_size |
Integer | Records per page for paginated endpoints |
max_sample_pages |
Integer | Maximum pages to fetch for profiling samples |
Data Type Mappings
Maps OpenAPI/JSON Schema types:
string→ Stringinteger→ Integernumber→ Decimalboolean→ Booleanarray,object→ Other
Metadata Ingestion
The ingestion script:
- Fetches the Swagger/OpenAPI specification from the configured URL
- API tags or resource groups become schemas
- Each endpoint (GET operations) becomes a table
- Response schema properties become columns with types from the spec
- Supports OpenAPI 2.0 (Swagger) and 3.0 specifications
Profiling
The profiling script:
- Calls each GET endpoint with pagination
- Samples response data up to the configured page limit
- Calculates record counts, null counts, distinct counts, and type-specific metrics
- Respects rate limits with configurable delays between requests
Installation
Customization
- Authentication — Modify for OAuth2, JWT, or custom authentication flows
- Pagination — Adjust pagination logic for APIs that use cursor-based, offset, or link-header pagination
- Endpoint filtering — Include or exclude specific endpoints from discovery
- Nested objects — Extend column discovery to flatten nested response objects
- POST endpoints — Modify the ingestion script to include POST endpoints that return data (e.g., search/filter endpoints)