What happens if an SQS message fails in Lambda
Lambda provides the ability to handle failed SQS messages in batches by implementing partial batch responses. Here’s how it works:
- You enable
ReportBatchItemFailures
for your Lambda function’s event source mapping. This tells Lambda that your function can report specific failures within a batch. - When Lambda invokes your function with an SQS batch, your code processes each message. If any messages fail, you capture their IDs.
- Your function response includes a
batchItemFailures
array containing the IDs of failed messages.
For example:
{
"batchItemFailures": [
{
"itemIdentifier": "message1-id"
},
{
"itemIdentifier": "message4-id"
}
]
}
- Lambda will make only the failed messages (
message1-id
andmessage4-id
in the example) visible again in the SQS queue. Successfully processed messages are removed. - If your function throws an exception, the entire batch is considered failed and all messages become visible again.
This approach helps reduce unnecessary retries for messages that were successfully processed.
You can also manually re-add failed messages back to the queue using the SQS API. But the partial batch response approach is simpler and leverages Lambda’s built-in retry mechanism.
Some key things to keep in mind:
- Set a high enough visibility timeout on your SQS queue to accommodate retries.
- Make your Lambda function code idempotent to handle messages being reprocessed.
- Monitor CloudWatch metrics to ensure your function is correctly reporting failures.