Connectors / Core / CSV Editor

CSV Editor

CSV Editor

A connector for manipulating CSV files (formerly known as CSV Processor)

Overview

The CSV Editor allows you to manipulate CSV files in various ways.

You can import and edit CSV files from outside your workflow, or you can dynamically create the CSV files from data being pulled in from previous workflow steps.

Think of the CSV Editor as a data storage connector for spreadsheets! It is most commonly used as a means of temporarily storing and manipulating structured information, en route from one service to another.

Importing CSV files

Please see our section on Managing CSV files for guidance on the different ways in which you can import CSV files to your workflows

CSV Editor vs CSV Reader

The CSV Reader is much more efficient at reading large files and file content. It uses less resources and processing power, meaning your results are delivered quicker and more effectively. Please don't use the CSV Editor if no actual data manipulation is required. Use the CSV Reader instead.
Please note that the CSV Reader and CSV Editor cannot be used interchangeably. A CSV instance created in the CSV Reader is not compatible with an instance created in the CSV Editor and vice versa. For example, a CSV instance ID created in the Editor is not valid if used in the Export CSV operation in the CSV Reader.
ConnectorWhen to use
CSV EditorWhen heavy manipulation of, or adding to, the data within the CSV file itself is required.
CSV ReaderWhen read-only access is needed, particularly when querying very large CSV files, paginating through the results and exporting said results.

Example scenarios for usage

Some example scenarios for the CSV Editor include:

  • As a Look up table. Again, if only reading is required, then please use the CSV Reader instead.

  • If you have data to manipulate in the future - e.g. some 'contacts' have been input to your CSV with a 'registered date' field that is not yet available from the source you are pulling data from. So you store the CSV at account-level and run a scheduled trigger to check for the updated data and update the CSV when it is found.

  • As an alternative API service. If you don't have an API that is fit for purpose, then the CSV Editor could be used as a small scale, simple, API service.

  • To run a Custom Query

Temp vs Persistent CSVs

By default, the operations 'Create CSV' and 'Create CSV from file' create a temporary CSV that is deleted after the current run of the workflow.

So you can only perform operations on this CSV within a single run of your workflow.

However, It is possible to export a CSV in order to make it available for use in another workflow. A common scenario for this is:

  • In your main workflow, you have done the initial work of pulling in data and creating a CSV
  • You now want to pass the CSV on to a second callable workflow in order to carry out some secondary processing, perhaps involving another third-party service

This 'modularized' way of working can help simplify your workflows and give you a clearer view of what you are working towards.

Please see the below section on persisting across workflows for details on this method.

Note, however, that it is not possible to permanently save information back into an exported CSV - you can only edit it and then use the finalized data to send to e.g. a third-party service.

If you wish to keep a CSV for repeated editing - over a period of, say, 4 weeks - you should store it at account level

Intro Examples

After going through the intro examples please read on through the notes and examples on efficiency and workflow optimization!

There are four potential 'Create CSV from...' operations (including 'XLSX' and 'XML'). However the operations in the first two examples here are the most commonly used.

Create an empty CSV

Create a temporary CSV for use within the current workflow execution:

  1. Setup using a Manual trigger and add a CSV Editor connector. Set the operation 'Create CSV' and scope to 'Current run'.

  2. Create a CSV with three columns and zero rows:

  3. The 'Create CSV' operation will return a unique ID that needs to be used in all other steps in which we want to manipulate the CSV data. This operation allows you to create a CSV with a certain number (and type) of columns as well as some default blank rows. This is viewable in the Debug panel:

  4. Add another CSV Editor step to the workflow and set its operation to 'Add row'. Use the $.steps.csv-1.id jsonpath to set its 'CSV ID' parameter to the ID found in the output of the previous step. Add some Cell Values for the row you wish to add.

  5. Then you can use 'Get rows' to check that your addition has worked:

    As shown in the debug panel:

Create CSV from file

Create a temporary CSV file for use within the current workflow execution, from a pre-existing file.

  1. Add the Google Drive Connector to the workflow and setup your authentication (see the Google Drive docs for more details). Set the operation to 'List files'.
  2. Set the 'Filter by Folder' drop down option to your target and the 'Filter by name' to the file name you wish to download.
  1. Add another Google Drive connector to our workflow and set the operation to 'Download Google file'. Use the connector-snake to set the 'File ID' to the ID from the previous Google Drive download step. The 'File type' should be set to 'CSV (first sheet only)'.
  2. When the workflow is run, the output from the 'Download file' step will provide a URL to our downloaded file.
  1. Add a CSV Editor step to your workflow and set the operation to 'Create CSV From File'. The pre-existing file must come from a URL reference point as the CSV Editor will require the URL of the file in order to bring it into the workflow. Use the connector snake to set the 'File URL' field to the URL output from our Download File step. If you now run the workflow your CSV file will be available to be manipulated in the output of the CSV Editor step.

Persisting across workflows

By default, the operations 'Create CSV' and 'Create CSV from file' create a temporary CSV that is deleted after the current run of the workflow.

You can perform operations on this CSV within a single run of your workflow.

If you wish to use your CSV in another workflow, you can pass the file url to a Callable Workflow.

Note, however, that it is not possible to permanently save information back into an exported CSV - you can only edit it and then use the finalized data to send to e.g. a third-party service.

If you wish to keep a CSV for repeated editing - over a period of, say, 4 weeks - you should store it at account level

The process for persisting a CSV is:

  1. Use the Export CSV operation, by pulling in the id from a previous Create CSV step (using $.steps.csv-2.file in this case):

  2. The resulting file is then passed in as Data in the Call workflow step:

  3. Then in the workflow being called, you can recreate the CSV from the file url passed to the trigger (using $.steps.trigger.url):

You are now free to build your secondary processing workflow!

Account level data

In terms of having multiple workflows access a single CSV, your needs will often be met by persisting a CSV, as per the method above.

However, there may be times when you need to store a CSV at account level for a period of time. For example, you may need to store a CSV which is available for editing by several manual- or scheduled-trigger workflows, before being finally processed and deleted.

The following workflows demonstrate a typical life-cycle for an account-level CSV:

Create the account-level CSV

This workflow creates an account-level CSV from an email attachment:

And stores the id in account-level data storage:

Edit CSV with other workflows

Any number of workflows can then access the account-level CSV, such as this one which sanitizes the data - using 'Delete rows by filter' to delete any entries which do not contain an email address:

  1. It gets the csv id from account-level data storage:
  1. Then uses the id for the 'Delete rows by filter' operation:

Process and delete CSV at the end of the month

The final workflow in this example runs on a scheduled trigger.

  1. It gets the csv id from data storage again.

  2. Then it exports the CSV to file:

  1. Attaches the file to an email:
  1. Then, finally, deletes the account-level CSV:

Parse text from CSV

In order to handle raw CSV text, the CSV Editor has a 'Parse Text' operation which will convert raw text into a list of rows.

Often the incoming data will be in the format of the 'body' in the webhook Input panel below. Notice how the delmiter in the example uses a backslash \ to separate the raw data.

To simulate the use of raw CSV text in a workflow and to keep things simple, this example will use the Javascript (previously known as 'Script') connector in conjunction with the following mock data:

1,Malinde,Monier,mmonier0@netvibes.com,Female
2,Fiann,Ayling,fayling1@i2i.jp,Female
3,Jasmine,Schulze,jschulze2@sourceforge.net,Female
4,Clerkclaude,Tedman,ctedman3@ox.ac.uk,Male
5,Tina,Asey,tasey4@thetimes.co.uk,Female
  1. Drop the Javascript connector into a workflow with a Manual trigger, and set a 'Variable' -> 'Name' for the mock data. Running the workflow should now return the mock data.
  1. Drag a CSV Editor step into the workflow and set the operation to 'Parse Text'. Set the 'Delimiter' attribute to reflect the data you're using. In this instance our text is delimited by a comma ,.
  2. Running the workflow will parse the data and return a list of rows.

Update rows by filter

The CSV Editor has operations which allow you to apply a filter to find specific row(s) and to then modify said rows that are returned.

As mentioned below in the note on running queries efficiently, setting up indices for your filters helps your workflow run faster as it means the workflow has less to search through.

In this example we will search for all the rows where Office location is equal to Seattle (Central). Where this returns true, we will update the address to Seattle (Northeast).

This walk-through builds upon the previous example, Create CSV from file.

  1. The key step here is Update Rows by Filter. In this example we set Office location in 'Cell Values' attribute and set it to Seattle (Northeast): A filter is added to find all entries where Office location is currently Seattle (Central):

  2. Running the workflow now will update the targeted cells within the filtered rows. If the run is successful, an object will be given in the Output panel providing details of how many rows were updated.

  3. As an extra step, you can use the 'Find Rows by Filter' operation and inspect the logs to see the updated cells / text in the output panel.

Efficiency and scaling

Running queries efficiently

Workflows that use CSV Editor steps should be designed in a way that maximizes efficiency to avoid lengthy execution times. Below are some general points to help you optimize CSV Editor usage in your workflows.

Typical use cases that require at least some amount of optimization, are as follows (see our Example efficient workflows section for more details):

  • Loading in and paginating CSV data.
  • Retrieving data from an existing CSV.
  • Creating and adding (batching) rows to a CSV.

You will find the following tips useful in such scenarios:

  • Generally the number of requests to CSV Editor can be reduced by batching up rows. This increases overall efficiency for the workflow and the platform.
  • Certain operations require more resources than others For example, the 'Find row' operation uses more resources than the 'Find rows by filter' operation, as it has to calculate the location where the row was found.
  • When it is known that only one or two columns will be searched frequently, it is best to add these header names to the 'Column indices' field. This will significantly improve the speed of searches
  • Calling concurrent workflow executions against the same CSV ID should generally be avoided i.e. if you are using an account-level CSV make sure that multiple workflows (or parallel runs of one workflow) are not attempting to access the same CSV (this is relevant when you are workflow threading)

Querying a row value

Remember - if you only need to query data throughout your workflow, and do not need to edit it, please use the CSV Reader instead of the CSV Editor.

That said, if you wish to query a particular row value, we advise you to use the "Column indices" field option found within the "Create CSV from file" operation.

The row value will be found quicker, as it reduces the index processing time. So make sure you set up your table indices correctly.

This means the column headings and data types must be accurate.

Find Row vs Find Rows by Filter

This can be confusing, as both operations can technically utilise a filter method. It is important to understand the difference so that you can use the best operation as per your use case.

Find Row returns the first row that matches the filter, and calculates which row number it found the data at. It can be used in conjunction with a filter, but should ideally only be used when there is just one result expected.

Find rows by Filter returns all of the rows in question which match the filter parameters, but doesn't return the row number associated with them. This operation should be used when searching for multiple matches, or if the row number itself is not relevant to the results.

If multiple rows must be retrieved (one at a time or in batches, the latter being optimal) then it is strongly recommended to use 'Find rows by filter' operation. They should be handled with an index specified in the Column indices creation input.

Minimize

Unless you specifically need to store data for later retrieval, store CSVs at "current run" as opposed to "account" scope.

Results will be delivered quicker and more effectively this way.

Essentially you should grab, manipulate, use and throw away your CSV data per workflow run.

We also advise that you keep the number of table rows and columns to a minimum.

Accuracy

Naming

Try and be specific (e.g. specify column names in the properties panel, even if they’re also in the CSV and should be picked up automatically)

Naming column headings accurately also makes your workflows easier to understand and therefore easier to manipulate.

Data types

Please make sure that the Data Type for each column corresponds with that expected by any third party service you may be transferring data to.

There are a lot of different formats and requirements for CSV files: delimiter, quotes, etc. Research the API documentation of the third-party service you are working with for example data requests and for further clarification.

If you accidentally mix data types during manipulation (e.g. use the CSV Editor to add a string into a number field in a third-party service) the service API's response may not be reliable. In some cases the attempt will just fail while in others it might succeed but produce unexpected results.

Technical limits

The CSV Editor is subject to the following technical limits:

CaseLimitNotes
Data that can be extracted from, or sent to, the CSV Editor in a single call1GBmore is likely to cause issues
File size limit when importing to the CSV Editor1GB
Max payload between steps1 MBas per between standard connectors in the builder
When I export my created CSV I get a download url
How long is the download URL active for?
6 hoursas per standard link persistence policy

Example efficient workflows

Handling huge data files (>100MB/ 1 million+ use case dependent), requires special workflow considerations in order to make sure the process is optimized.

Retrieving data effectively can cut down on the number of API calls necessary, reduce the feedback loop and run your workflow overall more efficiently.

It is important to retrieve data succinctly and make sure you import and paginate effectively. If you wish to add numerous rows to your CSV file then batching requests helps speed up the process.

Below are three workflow examples that each highlight one of the efficiency points mentioned above and utilise a particular CSV Editor operation in order to help highlight this.

Searching CSV data

It is strongly recommended to add a Column Index when creating the CSV file that will be queried. Instead of querying the whole file, the CSV Editor need only check the column referenced.

To illustrate this we can look at the following table, where we know that the last_name column will be the primary means of searching for employees within the CSV:

This can be set up by specifying a column index when you are creating a CSV:

The 'Create CSV From File' operation does not require 'Column Indices' fields to be set but by doing so it cuts down the workload that the CSV Editor needs to process.

The 'Find Rows By Filter' operation returns the rows in question according to the parameters set in the 'Filters' fields. Matching the filter with the 'Column Indices' fields from earlier accelerates workflow returns by reducing the overhead.

Batch data from a CSV to a service

When handling large amounts of rows or data in a CSV file, it is necessary to create a 'pagination' system. The point of it is to break the results down into manageable chunks (of e.g. 100 records each). This way the workflow load is handled more effectively.

The method below is particularly useful for transferring data from one service to another when the file size is too large for a single run.

Note that this method makes use of config data to set the size of each batch being paginated:

Note that it must be set as an integer - not a string!

This is a best practice for a variable that you want to use in repeated steps throught your workflow, and means it is easy to adjust if necessary.

For simplicity and demo purposes we have set the batch size to 10, but in reality, you can use large batches of 100, 1000 etc.

Click here to download an export of the following workflow to test and run yourself:
  1. Paginate batches of 10 is a loop connector set to 'loop forever'

  2. Get rows_done gets the rows_done at the start of each loop, defaulting to 0 on the first loop:

  3. start = rows_done +1 sets the row to start on for each loop (1, 11, 21, 31 etc.) by adding 1 to rows_done:

  4. Get 10 rows from CSV grabs the id from the first CSV step, feeds in the start row and the batch size (from config data):

  5. rows_done + 10 adds the batch size onto rows_done to calculate the new total for rows_done

  6. Continue fetching rows checks if the new rows done total is still less than the total rows from the original CSV file:

  7. Set rows_done sets the new rows_done total for the next loop to retrieve:

Batch data from a service to a CSV

When pulling data from a third-party service to input to a CSV, it is important to divide lengthy lists of records into smaller batches.

In this way you can process e.g. 10 batches of 100 rows instead of 1000 single calls to create 1 row at a time.

While batching is important what is most important for optimization is to not try and run too many concurrent executions against the same table as it could cause locking issues.

So if you have created an account-level CSV, you should make sure that you do not have multiple workflows or threaded executions of the same workflow attempting to write to the CSV at the same time.

The following workflow is available for download and testing here and gives a simple example of batching records being pulled from Salesforce for input to a CSV.

This example shows only one method of paginating results coming from a third-party service. Exactly how you will do this depends on the service you are using.

The key parts of this workflow are:

  1. Count records uses the Salesforce Find records operation to fetch all records of a certain type.

  2. Create empty CSV creates a CSV with appropriate columns to match the data being returned by Salesforce.

  3. Get no. of batches uses the List Helpers 'Get list of page numbers' operation. Here we use $.steps.salesforce-1.total to get the total number of records returned by Salesforce, and specify that we want to process 20 records per page / batch:

    This will return e.g. 11 batches of 20 for 216 records.

  4. Loop through batches is a loop connector which uses $.steps.list-helpers-1.result to loop through the number of pages / batches, as returned by Get no. of batches (e.g. the loop will run 11 times for 216 records divided into 20 batches)

  5. Page offset is a Math Helpers step which uses the loop number and batch size to calculate how many records should be skipped:

    For example:

    • Loop 1 = skip 0
    • Loop 2 = skip 20
    • Loop 3 = skip 40 etc.
  6. Get batch of records is a Salesforce 'Find Records' step which sets the 'limit' to e.g. 20 records and uses $.steps.math-helpers-1.result to pass the number of records to skip for this run of the loop:

    Salesforce also has a standard 'page offset token' method which can be used for larger batches of records (min 200 max 2000). If using this method, you simply store the token at the end of the loop, get it at the start, and then break the loop when no token is returned)
  7. Loop thru single batch then loops through the records returned by $.steps.salesforce-2.records

  8. Map keys then, for each record, maps the keys in Salesforce to the appropriate column headers in your CSV:

  9. Append to list of 20 uses the Data Storage 'Append to list' operation to add the current record to the current list.

    It is important to note here that the list uses record_list_{$.steps.loop-1.value} to increment the list name with each run of the loop.

    So the lists for each loop will be:

    • record_list_1
    • record_list_2
    • record_list_3 etc.

    This is an important technique to use to make sure that you never risk surpassing the data storage limit under a single key (400KB). This is only a potential concern when you are processing thousands of rows in a batch, and each row may contain a lot of data.

  10. Get list of 20 then uses record_list_{$.steps.loop-1.value} to retrieve the list for this batch

  11. Add 20 rows then inputs the retrieved list for this batch before the loop begins for the next batch:

All Operations

Latest version:

6.3