AI Agent Failure Detection and Root Cause Analysis with Strands Evals
AI Agent Failure Detection and Root Cause Analysis with Strands Evals
Publish Date: 2026-06-15 14:07:00
Source Domain: aws.amazon.com
-
Automatic Failure Detection: Strands Evals SDK Detectors automatically identify failures in agent execution traces, reducing the diagnosis time from hours to minutes.
-
Root Cause Analysis: Detectors perform root cause analysis to separate primary causes from downstream symptoms and provide recommendations for fixes.
-
Types of Failures: Detectors classify failures into nine categories, such as hallucination, incorrect actions, orchestration errors, and execution errors, along with associated confidence scores.
-
Diagnostic Workflow: Detectors operate in two phases: failure detection and root cause analysis, using large language model (LLM) analysis to manage sessions of different sizes.
-
Integration with Evaluation Pipelines: Analysts can integrate detectors within their evaluation pipelines to automate diagnosis on every test run, enabling immediate detection of what to fix.
-
Prerequisites and Setup: To use the detector, you need Python 3.10 or later, Strands Evals SDK, Amazon Bedrock model access, and configured AWS credentials.
-
Recommendations for Use: Start with a medium confidence level for routine use, and leverage both ON_FAILURE and ALWAYS modes depending on the evaluation context. Fix primary failures first to often resolve secondary issues.
-
Best Practices and Cleanup: Follow best practices for configuring diagnosis settings and monitor costs associated with Amazon Bedrock and CloudWatch Logs usage, and ensure that any log data required is retained before deletion.