Tray Platform / Standard / Best Practices / Pagination / Batch processing / Processing large volumes of data

Processing large volumes of data

(Pagination tutorial 2)

Introduction

This tutorial is similar to the first pagination tutorial on updating missing customer info in that it will take you through how to use pagination to loop through customer data and find records which match a particular condition.

The difference, however, is that this tutorial assumes you are processing large volumes of data and may want to do some more complex processing (i.e. enrichment) operations on the customer records being dealt with.

The way to deal with this is to send each batch off to a separate workflow for processing using a callable workflow. This has two benefits:

  1. The main workflow which is responsible for creating the batches can work much faster because it is only focused on this one task.
  2. The second processing/enrichment workflow can process multiple runs simultaneously. So your workflow batches do not have to wait in a queue until the previous batch is finished.

The following diagram is a good illustration of how this works:

This can massively reduce the overall processing time and could mean that a task which would have taken days will take less than 24 hrs.

It also opens up possibilities in terms of the amount of 'enrichment' you might want to carry out with customers. For example, within your second processing/enrichment workflow you could enrich each Salesforce opportunity by pulling in a load of extra data from Clearbit, while also creating a list of tasks to be carried out in an Asana project.

This is also an example of how you might modularize your workflows. This approach makes for more trackable logs, and clearer workflow design. Other developers on your team may choose to re-use these modularized flows in larger flows of their own, or clone your flow to test a new logical sequence in Sandbox mode.

Note also that this workflow uses a slightly different pagination method in that it uses a 'count' operation in conjunction with a List Helper to create a set number of batches and run a loop connector for a set number of times.

Workflow #1: Create Batches

1 - Create Batches

1. Using a Salesforce connector, calculate a total count of all your opportunities.

async-2

2. Paginate the total count. Add a List Helpers connector, with Operation: Get List of Page Numbers. In the Per Page section, enter the number of records you'd like in each batch. This will split the count of total records into a paginated batch of the size you select.

async-3-paginate

3. Loop through each page returned by the List Helpers connector.

async-4-loop

You are now ready to set up the loop for each batch - described in Tab 2 above

2 - Loop through the batch

4. Grab a batch of Salesforce records that correlates with the size of the batch you indicated in Step 2. The Salesforce connector returns a pagination token for each query IF the size-limit for the query is smaller than the total number of records potentially returned for that query.

async-5-batch

5. Use Data Storage to store the pagination token returned from Salesforce and pass into your next batch. Name the key salesforce_offset

async-6-storage

6. For subsequent iterations of the loop, you'll want to pass the pagination token you've just stored in Data Storage into your Salesforce batching step. Do this by performing a GET operation with the same key you used in (5).

async-7-get-token

7. Finally, you'll want to ensure you reset the pagination token to ZERO before each run. That way, you'll start from record 01 each time.

async-8-reset-token

Workflow #2: Process Batches

  1. Create Workflow #2 with a Callable Trigger. Once this is configured, we can select it from the dropdown in the Call Workflow step in Workflow #1.
  1. The final step in Workflow #1 is to send the entire payload of the batch you've created to a secondary flow. Do to this, select your Workflow #2 in the dropdown in the Call Workflow connector and pass in the entire payload of your Salesforce call.
  1. Now, in Workflow #2, use a Loop Collection connector to process each data element. You can use the .$.steps.trigger jsonpath to pass in the entire Trigger as the List, which at this point will contain the data payload you sent to the workflow in the previous step.

Process your Data

At this point, you have successfully configured a set of workflows that will process your data asynchronously in multiple batches. An example enrichment process for Workflow #2 will be coming soon - but in the meantime, feel free to experiment!

In Workflow #2, you can open the Debug tab to see multiple instances of the same flow run at one time, in parallel.