Tray Platform / Standard / Best Practices / Pagination / Batch processing / Using batch update operations

Using batch update operations

Introduction

Many connectors have batch methods, for instance in Salesforce and Marketo you can batch update records such as leads and contacts. In the case of Salesforce, there is a limit of 25 records that can be updated in a single call. The question is, if we start with a list of more than 25 records, how do we efficiently divide the list into batches of 25 that can be processed in a single call?

Note this example makes extensive use of the Script connector, and the Lodash Javascript library to perform data transformations.

Workflow example: Pull data from CSV reader and push to Salesforce

Workflow Summary

In this example, we will be using the CSV Reader to retrieve the data to be used in a batch update.

We will be showing how to use the Salesforce Batch update records operation, pulling data from a CSV structured in such a way as:

A key point to batch update operations - which will be common to all services - is that you will need the ID of the record/object being updated. This is shown above, along with the data (phone and city) that you are wanting to update for each record.

Note that the column names in the above example may not match the correct field names as they are expected by the API in the service you are sending the updates to (Salesforce in this example).

You can 'map' the batch update data to the correct fields by manually creating an override list using a script step similar to the Set field list in the workflow below, which then gets used in the subsequent Key-pair rows and Format update scripts.

The Set field list step creates the correct field list of object_id, Phone and BillingCity.

The complete workflow looks like:

The first section of the workflow shows the basic process of creating a CSV and pulling results from it in the Check execution loop:

  1. Pull CSV file from a source (such as Google Drive)
  2. Use the CSV Editor Create CSV operation to create a CSV that can be accessed in the workflow
  3. Use the CSV Editor Start query operation to being the process of extracting rows from the CSV
  4. Use the CSV Editor Get query execution operation to check if the query has finished

Once the query has finished the results can be pulled from the created CSV. So we then set up the Paginate CSV loop which:

  1. Uses Get Query Results to pull a number of rows from the CSV connector equal to the max allowed by the service API
  2. Uses Get next page and Set next page to store and retrieve a token which indicates if there are more rows to be pulled from the CSV
  3. Uses Set field list to set the correct field names expected by the service API
  4. Uses Key-pair rows to map the individual rows with the field list
  5. Uses Format update to put the data for each record in the exact format expected by the API
  6. Uses Batch update to send the formatted batch of records to the service, using the 'batch update'-type operation
  7. Uses Has next page? to check if the CSV Reader Get Query results step returned a next token
  8. Uses Set next page to set the next token for the next loop iteration, if it is found
  9. Uses Break loop if no token is found

1- Getting CSV query results and a next page token

Please see our CSV Reader documentation for instructions on setting up the first part of the workflow, up to breaking the loop after the Get query execution has returned a state of 'SUCCEEDED'.

Once the CSV has been exported using the Start Query operation, we can return rows by using the Get query results operation and specifying how many rows to fetch, as well as an offset.

By looking at the output schema for this operation, we can see that the operation will return a property called 'NextToken' if there are more results to return. If there is no next page, the value of this property will be Null:

With the data storage helper, we can use this token to drive the pagination of results, as explained below.

2 - Setup the basic query results pagination loop

The start of this construction is a loop step, using the operation Loop Forever, which we use to loop through the CSV:

Get query results (CSV Reader)

Inside this loop, we place a CSV reader step, with the operation set to Get query results. The Query execution ID parameter will be set to reference the 'QueryExecutionId' of a previous CSV reader step set to Start query. The next parameter is the maximum number of results to return. As we are sending batches of 25 to Salesforce in this example, we set this field to the value 25. The last parameter is where we set a Next token to retrieve the next page of results on each subsequent request.

Get next page (data storage)

In order to set a value for this field, we need some way of referencing the token from the previous run of the loop, for each run after the first. In order to accomplish this, we use data storage to store and retrieve the token. First, we add a data storage step with the operation set to Get Value, and set the Key to something informative such as 'token'. The Default Value of this variable will be (null). We place this step just before the CSV reader step. Then in the CSV reader step, we set the value of Next token to a jsonPath referencing the value from this data storage step:

Has next page? (boolean)

Next, we need to set the value for this variable. Firstly, we add a boolean condition step after the CSV reader step, to check whether this step has output a token, implying there is a subsequent page of results to fetch. The value of 1st Value will be a jsonPath to the 'NextToken' property output by the Get Query Results CSV step. We wish to check that the value of this property is not null:

Set next page (TRUE branch)

Inside the TRUE branch of this boolean condition, we use data storage to store the value of 'NextToken' for the next run. Duplicate the data storage step you have already created, and move this duplicated step in to the boolean condition. Then, change the operation of this data storage step to Set Value, and then set the value of Value to a jsonPath referencing the 'NextToken' property from the CSV reader step (the same jsonPath that was used in the boolean condition).

Break loop if no token found (FALSE branch)

Inside the FALSE branch of the boolean condition, we need to stop the loop from continuing when there are no more records to retrieve. We achieve this by using the Break Loop connector, and its Break operation, specifying the name of the loop step to break:

3 - Process results and send to a service

We now have the minimum setup necessary for paging results from the CSV reader. This requires adding the script and Salesforce steps to the above pagination system. The full loop for this section looks like this:

Set field list (script)

Before we process the rows of results we need to set the field list, as expected by the Salesforce API. We can do this using a simple script such as:

exports.step = function() {
return {
"salesforce_fields": ["object_id", "Phone", "BillingCity"]
}
};

Key-pair rows (script)

The next stage is to process these rows to send to a service, in this case Salesforce. The CSV reader outputs each column as a list, rather than an object, and so we need to transform the output to use in Salesforce. Given a list of column headers in the correct order, we can transform the data from the CSV reader into rows of key-value pairs. For instance, in our workflow we have a script which outputs a property called salesforce_fields, which is an ordered list of column names. After the CSV reader step to get query results, we set up a script step as follows:

The csvRows variable comes from the CSV reader step's RowData property, and the columnHeaders variable contains our list of ordered column names. The contents the Script are as follows:

// Maps all 25 rows
exports.step = (input) => _.map(
input.csvRows,
(row) => _.reduce(
// Within each row, we loop through the list of values and assign each value to an object, with the key name being the column name
row.Data,
(acc, data, index) => {
acc[input.columnHeaders[index]] = data.VarCharValue;
return acc;
},
{}
),
[]
);

This will then output a list of rows such as:

Format update

Note that the column headers we have used are the schema field names that Salesforce expects (if you were to fill in the input panel for the Salesforce connector, then these would be the field names that appear in the Input part of the Debug when the workflow is run). Hence we can simply map these rows to a schema structure for the Salesforce Batch update records operation. If we have a Salesforce step using the Batch update records operation in a standalone operation, we run the workflow, and then look at the Input part of the Debug panel, we should see a payload being sent such as the following:

{
"result": [
{
"object_id": "0014J000007k2moQAA",
"fields": [
{
"key": "Phone",
"value": "869-246-0198"
},
{
"key": "BillingCity",
"value": "Seattle"
}
]
},
{
"object_id": "0014J000006m8EVQAY",
"fields": [
{
"key": "Phone",
"value": "996-312-0009"
},
{
"key": "BillingCity",
"value": "Chicago"
}
]
},
{
"object_id": "0014J000006m89GQAQ",
"fields": [
{
"key": "Phone",
"value": "135-198-2336"
},
{
"key": "BillingCity",
"value": "Belfast"
}
]
}
],
"console": []
}

As we can see, object_id is a special top-level field, but for all other fields, we can put them inside the fields list, where the key is the field name and value is the value to assign to the field. In order to achieve this structure, we place a another script step in the workflow just after the previous script step. We define a variable called 'rows', and set its value to the output of the previous script:

Then we set the contents of Script to:

exports.step = (input) => _.map(
input.rows,
(row) => ({
object_id: row.object_id,
fields: _.reduce(
_.omitBy(row, (value, key) => key === 'object_id'), // remove the object_id property which we have placed above
(acc, fieldValue, fieldKey) => {
acc.push({
key: fieldKey,
value: fieldValue
});
return acc;
},
[]
)
})
);

This will output the list we need for the batch_update_list property in our Salesforce step.

Batch update (Salesforce)

Now, we can add a Salesforce step to the workflow just after this script step:

This uses the Batch update records operation. Set the value of Record type to the type of object we are updating, 'Account' in this case. Then the value of Batch update list will be a jsonPath referencing the output of the Format update script step.

Tip: In order to reduce your task usage, you could combine both script steps into one.