Retry steps execution
A step that fails its execution might result in the failure of the entire workflow, but oftentimes errors are expected and the execution can be safely retried. Think of a HTTP request that times out because of a transient congestion of the network, or an external API call that hits a rate limiter.
For all those situation where you want the step to try again, you can use a “Retry Policy”. A retry policy is an object that instructs the workflow to execute a step multiple times, dictating how much time has to pass before a new attempt. Policies take into consideration how much time passed since the first failure, how many consecutive failures happened and which was the last error occurred.
To set a policy for a specific step, all you have to do is passing a policy object to the @step
decorator:
from llama_index.core.workflow.retry_policy import ConstantDelayRetryPolicy
class MyWorkflow(Workflow): # ...more workflow definition...
# This policy will retry this step on failure every 5 seconds for at most 10 times @step(retry_policy=ConstantDelayRetryPolicy(delay=5, maximum_attempts=10)) async def flaky_step(self, ctx: Context, ev: StartEvent) -> StopEvent: result = flaky_call() # this might raise return StopEvent(result=result)
You can see the API docs for a detailed description of the policies
available in the framework. If you can’t find a policy that’s suitable for your use case, you can easily write a
custom one. The only requirement for custom policies is to write a Python class that respects the RetryPolicy
protocol. In other words, your custom policy class must have a method with the following signature:
def next( self, elapsed_time: float, attempts: int, error: Exception) -> Optional[float]: ...
For example, this is a retry policy that’s excited about the weekend and only retries a step if it’s Friday:
from datetime import datetime
class RetryOnFridayPolicy: def next( self, elapsed_time: float, attempts: int, error: Exception ) -> Optional[float]: if datetime.today().strftime("%A") == "Friday": # retry in 5 seconds return 5 # tell the workflow we don't want to retry return None