- 20 Mar 2024
- 2 Minutes to read
- PDF
Data Profiles
- Updated on 20 Mar 2024
- 2 Minutes to read
- PDF
What are Data Profiles?
Data Profiles are information about existing data that help to determine the accuracy, completeness, and quality of your data. Data profiling is typically used within the broader context of ELT, monitoring, and Data Governance. When done properly, data profiling can play a significant part in data cleansing, enriching, and maintaining quality data within an organization.
Some common data profiles that are useful in many cases are:
- Record count
- Distinct value count
- Nulls
- Range of values
- Average values
Validatar contains a standard set of 40 data profiles. They are listed on the Data Profiles page with filterable columns. Any data profile on the list can be deleted by clicking on the white space next to the Name and then clicking the Delete button at the top of the page.
Creating a New Data Profile
When you create a new data profile, you're creating the existence of the profile. To configure the data profile definition, navigate to Settings > Database Engines > Profiling > Choose the data profile. This allows for different database engines to use the same data profile name, but to be configured differently.
- Click New Profile on the Data Profiles page.
- Configure the settings in Table 1.
- This will bring up a new page with space to enter the Name (required), Reference Key (required), Description (optional), Profiled Object (required), and Results Format (required). Once the required fields are populated, the Save button will become clickable and you can save the new profile.
Table 1
Setting | Description |
---|---|
Name | The data profile name. |
Reference Key | The data profile's unique key identifier. |
Description (optional) | A description of the data profile. |
Profile Object | Choose a table or column-level profile.
|
Result Format | The data type and format the result should be in. Options are:
|
Restrict data types (checkbox) | When checked, the data profile is only valid for the selected data types. The data type options are:
|
Data Profile List
The complete list of the standard profiles pre-written in Validatar is:
Record Count | Lower Quartile | Null Percent | Maximum (Date) |
Total Data MB | Upper Quartile | Blank Count | Longest Value |
Distinct Count | Minimum (String) | Blank Percent | Shortest Value |
Distinct Percent | Maximum (String) | Numeric Count | Distribution (String) |
Most Common Value | Standard Deviation | Numeric Percent | Distribution (Numeric) |
Most Common Count | Max Length | Zero Count | Top 10 Values |
Minimum (Numeric) | Min Length | Zero Percent | Bottom 10 Values |
Maximum (Numeric) | Mean Length | Negative Count | Binned (Numeric) |
Mean (Numeric) | Length Distribution | Negative Percent | Year Distribution |
Median (Numeric) | Null Count | Minimum (Date) | Year Month Distribution |
Data Profile Details
Each data profile name is clickable and when opened, displays a details page for that profile. The details include Name, Reference Key, Description, Profiled Object, and Results Format. There is also an optional Restrict Data Type checkbox and if checked, will display a list of options. Multiple options can be checked within the Data Type options list. The data profile details page can be updated and saved.