Open Navigation

CSV Reader

The CSV Reader is a connector for importing and querying large sets of data
On This Page

Overview

The CSV Reader currently allows for the querying of very large CSV files and then paginating through the results, or exporting the to a new CSV file.

If heavy manipulation of or adding to the data within the CSV file itself is required, then it is recommended that the CSV Editor be used.

Please note: The CSV Reader and CSV Editor cannot be used interchangeably. A CSV instance created in the CSV Reader is not compatible with an instance created in the CSV Editor and vice versa.

For example, a query execution ID created in the Reader is not valid if used in the Export CSV operation in the Editor

Basic Operations List

  • Create CSV
  • Export CSV file
  • Export split CSV file
  • Get query execution
  • Get query results
  • Start query

Example - Basic use of the CSV Reader

Step 1 - Create a CSV instance

The first step when using the CSV Reader is to create a CSV instance. This step prepares the CSV file for querying. This operation expects that the CSV file has been downloaded using one of the other connectors. In this example, we will use the Google Drive connector. The Create CSV operation accepts a file object from the output of this connector.

File input for create CSV

The column names and data typesalso have to be specified when Creating a CSV instance. Please do this in the Columns input option in the property panel. If any column requires specific querying as a number, then that must be specified here, otherwise the query may not behave as expected.

Enter column names and types

Once the CSV file and columns names/types have been specified, some more information about the CSV file to be queried is needed. This is specified through the following options:

  • Header row - specifies if the CSV to be query contains the column names as the first row. If this is the case, the CSV Reader will need to ignore the first row when returning results.
  • Delimiter - the character that separates values in the file, usually a comma
  • Enclose character - the character used to enclose values within the CSV file, usually double quotes
  • Escape character - the character used to escape other reserved characters. For example, if double quotes are used as the enclose character, and there is also a double quote in a value, then this needs to be escaped, usually by another double quote e.g "Value ""with"" quote".

CSV format options

Step 2 - Start a query

The CSV Reader works by setting up query executions that run in the background. This means results will not be returned in the Start query operation. Instead, the workflow needs to poll the status of the query execution until the status shows as SUCCEEDED. The query execution must be in this state in order for results to be obtained.

To use the Start query operation, firstly a CSV ID must be specified. The CSV ID can be obtained from the Create CSV operation.

Specify CSV ID

When starting a query, conditions can be provided to filter on specified columns and values. The first condition is to specify the Columns that you would like returned in the results. If this section is left blank, all columns will be returned.

Columns to be returned

The Filters section can be used to filter out rows based on their values. Instead of individual filters, filters can be combined in Filter groups to give more flexibility over the query. There are two types of filter group:

  • Match all - all filters specified in a match all group will mean that all filter criteria must be satisfied for a row to be return e.g. Header 1 equal to more than 5 AND Header 2 equal to 'example'
  • Match any - rows will be returned if any of the criteria within a match any group are met e.g. Header 1 equal to more than 5 OR Header 2 equal to 'example'

Multiple filter groups can be specified.

Each filter has three components, the Column name (the column to apply filter), Operator (how the value should be compared to the cell value in the CSV) and Value. The Like operator should be used with text columns. The % wildcard can also be used here to filter on parts of a value e.g. %test would return all values with the word test at the end.

If the column data type is number, then the Equal operator would be more appropriate.

Filter conditions

The Condition strictness option specifies if all filter groups should be satisfied before returning a row in the query results.

Conditions strictness options

The last option that can be specified is the Order by option. This is important if you would like to retrieve results in a certain order. Results will not necessarily be returned in the order that they appear in the CSV so use of the order by option is advised.

Order by

Once the columns and filters are specified, the query can then be executed. This operations returns a query execution ID, which can be used in other operations to find out the state of a query and to obtain the results once the query has finished.

Step 3 - Get query execution

To find out the state a of a query execution, the get query execution operation can be used. This takes a query execution ID as input and returns the status of that execution.

Poll get query execution

If combined with the Loop and Boolean condition connectors, get query execution can be polled until the status of the query execution has SUCCEEDED, as shown in the below screenshot.

Please be aware that the query execution may show a FAILED state, in which case the workflow should also check for this using a boolean condition and Terminate the workflow.

Query execution boolean

Once the operation has succeeded, the loop can be broken and the workflow can be proceed to the next CSV Reader operation.

Step 4 - Get query results

This operation accepts a Query execution ID and, as long as that query has completed, will return a list of results. If the expected result set is very large, the results can be truncated.

When results are truncated, a Next token will be provided with the output. This next token can be used to obtain the next page of results. This can continue until all results have been obtained. No next token indicates the end of the results.

The results are returned as a list of items/objects. Each of these objects represents a row in the CSV file. Within these row items are a further list of items/objects which represent the individual values/cells of data.

Get query results output

The order of items in the Data property correspond to the order of columns in the CSV file e.g. the first item would be the first column. The exception to this rule is when Columns were specified in the Start query operation. If columns were specified, the order of the items will correspond to the order of the columns entered in the input property panel.

Column order

For example, in the above screenshot, we can see that Header 3, Header 6, and Header 2 have been entered. This means that, despite the order in the CSV file being Header 2, Header 3, Header 6, the values in the VarCharValue property will be in the order Header 3, Header 6, and Header 2, as shown below.

Column order with filter

Step 5 - Export CSV file

This step may not be required, but if you would like to export the entire result set into another CSV file, the Export CSV file operation can be used to retrieve a CSV file. Provide a Query execution id as input, and as long as the execution has completed the results should be available to export to CSV.

CSV files can also be split by file size. Using the Export split CSV file operation, users can split files into a file size of their choice. However, there is a limit on the number of files that can be generated. Please make sure when splitting by a file size, the number of split files will not exceed 100. The minimum file size that can be entered is 0.5. The file size is in megabytes.

Was this article helpful?
Yes
No
On This Page