Tray Embedded / Troubleshooting / Error Handling

Error Handling

Note that this page is for Tray Embedded customers. If you are not an Embedded customer please see the Error Handling guide for the main Tray Platform

Available methods

Different stakeholders need to be taken into consideration for debugging/ troubleshooting Embedded solution instances - integration builders, customer support, and the client themselves. Tray.io has several features available allowing integration notifications or errors to be surfaced to these stakeholders:

The combination of which features are used will ultimately depend on the following:

  • What triggers do your solution workflows use?

  • Who's going to be receiving & handling execution errors?

  • Does each solution require different error handling, or do error notifications need to include execution context from an external system?

  • Does the workflow execute & respond synchronously or asynchronously? For instance if a solution instance is executed by sending a webhook request to Tray.io, does the calling app expect to receive an immediate response and then poll for updates, or does it expect the full execution information to be returned in the call?

  • Do you have a log ingestion tool (such as Datadog)?

Decision process charts

The following process charts can show how you can take these factors into consideration when scoping your implementation.

Webhook-triggered workflows

Service or scheduled trigger workflows

Example implementations

1 - Realtime service-triggered solution instance workflows (partner's support team handle errors)

e.g. listen to incoming lead events from Salesforce. As the partner's support team will be handling all errors on behalf of customers, they can use Alert / Solution Alert triggers to send a message to their support team for follow up.


2 - Realtime service-triggered solution instance workflows (partner's End Users self-serve errors)

e.g. listen to incoming lead events from Salesforce. As the partner wants their customer to self-service errors, they will likely need to store execution / error logs on their end to then surface to the customer.

If the customer has a log ingestion tool, such as Datadog, they can leverage Log streaming to ingest all errors.

If not, the customer can leverage Alert / Partner Alert triggers along with a Database or queue connector to send logs to their system.


3 - Solution instance workflows triggered from partner's app (webhook trigger), execution is synchronous, partner's End Users self-serve errors

In this case the partner's application will send event payloads to Tray.io for each solution instance execution,

This allows realtime syncs or data fetches. As the application is waiting for the workflow to execute and return a response, step-level error handling can be used to handle errors and send a customised message to the calling application.


4 - Solution instance workflows triggered from partner's app (webhook trigger), execution is asynchronous, partner's End Users self-serve errors

In this case the partner's application will also send event payloads to Tray.io for each solution instance execution, allowing realtime syncs or data fetches.

However, the application expects an instance response just to confirm that the webhook has been received by Tray.io.

It then likely polls an endpoint, awaits logs or polls a file bucket to check when the execution has finished.

In this scenario, a service connector with the partner's static auth will likely be used in solution instances to send success / error payloads to an external service.

Errors can either be handled using workflow step-level error handling, or more scalably through a Solution Alert trigger.

A common approach to this is:

  1. Listening to an incoming request ID that is unique to each webhook event, using Data Storage to save this under a key containing the Tray.io execution ID,
  2. This can then be referenced to retrieve the original request ID inside the Alert trigger workflow.

Note on services failing

If a third party being used in your workflow is having network issues, this can cause your whole workflow to fail.

A best practice approach to consider here is to set your service connectors to use Manual error handling so that you can take immediate appropriate action should this be the case.

Please see the section in our main error handling docs for guidance on this.

Note on points of failure

You should be aware that setting up error handling systems within your workflow can add more points of failure. An example of this is when somebody might change the login credentials for the MySQL database you are using to store your status messages.