Validation Helper

Validate various list data structures

Overview

The validation helper validates CSV files, spreadsheets (e.g., XLSX files), lists of lists, and lists of dictionaries.

Actions

The Validation Helper has only one default action - Validate.

The Data Reference field requires the data to be validated, which requires a reference only based on the flow step's ID, like get_cities

When a row's field doesn't meet the specified conditions, the user can choose what action to take. The options are:

Terminate flow - will directly terminate the entire flow if a field cannot pass the specified validation.
Drop invalid rows - will drop all rows in which a field could not pass the specified validation and return the remaining rows inside a dictionary called data. In contrast, errors are returned inside a dictionary called errors.
Return errors - will return all the input data inside a dictionary called data, while errors are returned inside a dictionary called errors.

Fill out the Column input with columns or fields where the validation should be applied. To do so, either write the column name (if the data comes from a spreadsheet or dictionary) or the column's index (if the data comes from a CSV or list of lists). Afterward, select the Field Type for this column or field and a corresponding Condition. Finally, you enter the Condition Value.

Furthermore, it is possible to Ignore blank. By clicking the + button, you can repeat this process for each field or column, which should be validated by adding a new set of fields.

Examples:

If data comes from an XLSX file the data would be stored as a list of dictionaries:

List of dicts

[
  {
   "city": "Zurich",
   "population": 434008
  },
  {
   "city": "Berlin",
   "population": 3654802
  }
]

If data comes from a CSV file and is hence handled as a list of lists in the flow data handling logic:

List of lists

[
  [
   "Zurich",
    434008
  ],
  [
   "Berlin",
    3654802
  ]
]

The references to the columns are made by specifying the column index, i.e., 0 for the first column and 2 for the third column (note: the indexes start at 0). In this example, we check that the population (column index 1) is greater than 500 000:

Validating more complex data

The Validation Helper can also work with more complex, nested dictionaries, such as:

{
  "contacts" : [
    {
      "id": 123,
      "name": "Michael",
      "address": {
        "country": "United States",
        "city": "New York"
      }
    },
    {
      "id": 124,
      "name": "Maria",
      "address": {
        "country": "Spain",
        "city": "Madrid"
      }
    }
  ],
  "companies": [
    {
      "id": 200,
      "name": "Tech Company",
      "website": "www.tech.com",
    }
  ]
}

In order to only access and validate the contacts (dictionary), in the above example, the path of the nested dictionary, which should be validated, has to be specified in the Data Reference field, e.g., if the data comes from connector xls1, the reference would be xls1.contacts. In this way, one can even reference nested fields within the contacts dictionary, such as the field country:

The resulting output in this case is: JavaScript

{
  "data": [
    {
      "id": 124,
      "name": "Maria",
      "address": {
        "country": "Spain",
        "city": "Madrid"
      }
    }
  ],
  "errors": [
    {
      "id": 123,
      "name": "Michael",
      "address": {
        "country": "United States",
        "city": "New York"
      },
      "errors": {
        "address.country": {
          "United States": [
            "max length is 6"
          ]
        }
      }
    }
  ]
}

PreviousData Warehouse Helper NextXML Helper

Last updated 5 months ago