AI Agent Failure Detection and Root Cause Analysis with Strands Evals

AI Agent Failure Detection and Root Cause Analysis with Strands Evals

AI Agent Failure Detection and Root Cause Analysis with Strands Evals

https://aws.amazon.com/blogs/machine-learning/ai-agent-failure-detection-and-root-cause-analysis-with-strands-evals/

Publish Date: 2026-06-15 14:07:00

Source Domain: aws.amazon.com

  • Automatic Failure Detection: Strands Evals SDK Detectors automatically identify failures in agent execution traces, reducing the diagnosis time from hours to minutes.

  • Root Cause Analysis: Detectors perform root cause analysis to separate primary causes from downstream symptoms and provide recommendations for fixes.

  • Types of Failures: Detectors classify failures into nine categories, such as hallucination, incorrect actions, orchestration errors, and execution errors, along with associated confidence scores.

  • Diagnostic Workflow: Detectors operate in two phases: failure detection and root cause analysis, using large language model (LLM) analysis to manage sessions of different sizes.

  • Integration with Evaluation Pipelines: Analysts can integrate detectors within their evaluation pipelines to automate diagnosis on every test run, enabling immediate detection of what to fix.

  • Prerequisites and Setup: To use the detector, you need Python 3.10 or later, Strands Evals SDK, Amazon Bedrock model access, and configured AWS credentials.

  • Recommendations for Use: Start with a medium confidence level for routine use, and leverage both ON_FAILURE and ALWAYS modes depending on the evaluation context. Fix primary failures first to often resolve secondary issues.

  • Best Practices and Cleanup: Follow best practices for configuring diagnosis settings and monitor costs associated with Amazon Bedrock and CloudWatch Logs usage, and ensure that any log data required is retained before deletion.