`OSSS.ai.dependencies.failure_manager`¶

`OSSS.ai.dependencies.failure_manager` ¶

Advanced failure management with cascade prevention and retry strategies.

This module provides sophisticated failure handling capabilities including cascade prevention, intelligent retry strategies, failure impact analysis, and recovery mechanisms for agent execution failures.

`FailureType` ¶

Bases: Enum

Types of failures that can occur during agent execution.

`CascadePreventionStrategy` ¶

Bases: Enum

Strategies for preventing failure cascades.

`RetryStrategy` ¶

Bases: Enum

Retry strategies for failed agents.

`RetryConfiguration` ¶

Bases: BaseModel

Configuration for retry behavior.

Migrated from dataclass to Pydantic BaseModel for enhanced validation, serialization, and integration with the OSSS Pydantic ecosystem.

`calculate_delay(attempt, recent_failures=0)` ¶

Calculate delay for the given attempt number.

`FailureRecord` ¶

Bases: BaseModel

Record of a failure event.

Migrated from dataclass to Pydantic BaseModel for enhanced validation, serialization, and integration with the OSSS Pydantic ecosystem.

`to_dict()` ¶

Convert to dictionary representation.

`FailureImpactAnalysis` ¶

Bases: BaseModel

Analysis of failure impact on the execution graph.

Migrated from dataclass to Pydantic BaseModel for enhanced validation, serialization, and integration with the OSSS Pydantic ecosystem.

`get_total_affected_count()` ¶

Get total number of affected agents.

`has_recovery_options()` ¶

Check if recovery options are available.

`DependencyCircuitBreaker` ¶

Circuit breaker for preventing cascade failures.

`can_execute()` ¶

Check if execution is allowed through the circuit breaker.

`record_success()` ¶

Record a successful execution.

`record_failure()` ¶

Record a failed execution.

`get_state()` ¶

Get current circuit breaker state.

`FailureManager` ¶

Advanced failure manager with cascade prevention and recovery strategies.

Provides comprehensive failure handling including impact analysis, cascade prevention, retry management, and recovery coordination.

`configure_retry(agent_id, config)` ¶

Configure retry behavior for a specific agent.

`configure_circuit_breaker(agent_id, failure_threshold=5, recovery_timeout_ms=60000)` ¶

Configure circuit breaker for a specific agent.

`set_cascade_prevention(strategy)` ¶

Set the cascade prevention strategy.

`add_fallback_chain(agent_id, fallback_agents)` ¶

Add fallback chain for an agent.

`can_execute_agent(agent_id, context)` ¶

Check if an agent can execute given current failure state.

Returns¶

Tuple[bool, str] (can_execute, reason)

`handle_agent_failure(agent_id, error, context, attempt_number=1)` `async` ¶

Handle an agent failure with comprehensive recovery strategies.

Parameters¶

agent_id : str ID of the failed agent error : Exception The error that occurred context : AgentContext Current execution context attempt_number : int Current attempt number

Returns¶

Tuple[bool, Optional[str]] (should_retry, recovery_action)

`analyze_failure_impact(failed_agent, context)` ¶

Analyze the impact of an agent failure on the execution graph.

`analyze_dependency_failures(agent_id, context)` ¶

Analyze how dependency failures affect a specific agent.

`attempt_recovery(agent_id, recovery_action, context)` `async` ¶

Attempt to recover from a failure using the specified recovery action.

`create_checkpoint(checkpoint_id, context)` ¶

Create a checkpoint for recovery purposes.

`get_failure_statistics()` ¶

Get comprehensive failure statistics.

OSSS.ai.dependencies.failure_manager¶