AgentX

SRE Incident Triage in 90 Seconds

3-agent pipeline · Codebase-aware · Fully automated routing

Marylin Alarcon | SoftServe AgentX Hackathon 2026 | #AgentXHackathon

02 — The Problem

Manual triage is killing your SREs

2.5 hrs / day

15–30 min × 5 incidents = wasted engineering time

Reading logs

Sifting through 100s of log lines to find the root cause

Writing tickets

Manually crafting Linear issues with severity, labels, assignment

Pinging Slack

Finding the right channel, tagging the right people at 3 AM

03 — The Solution

Three agents, one pipeline, zero manual steps

Report an incident → get a triaged, routed, and tracked issue in 90 seconds. Fully autonomous.

Intake Agent

Multimodal extraction
Text, images, logs, video

Triage Agent

Codebase analysis
6 investigation tools

Router Agent

Linear, Slack, Email
Priority-based routing

Report → Triaged → Routed in ~90 seconds

04 — Agent 1

Intake Agent

Multimodal extraction — understands text, images, logs, and screen recordings

Text Description

Free-form incident descriptions parsed into structured format

Screenshot Analysis

Claude Vision extracts error messages, stack traces, UI state

Log Parsing

Structured extraction from raw log dumps and stack traces

Screen Recordings

Browser-recorded sessions analyzed for reproduction steps

Input → Output

IN

Raw incident report (any format)

Claude 4 Sonnet

Multimodal processing + structured extraction

OUT

Structured incident object

{ title, description, component,
error_messages[], affected_systems[],
reproduction_steps, visual_evidence }

05 — Agent 2

Triage Agent

Codebase-aware analysis — investigates like an SRE, not just an LLM

module_search

Find relevant source files by component or keyword

code_reader

Read source files and understand implementation details

error_patterns

Match error signatures against known pattern database

dependency_graph

Trace module dependencies to assess blast radius

config_analysis

Check environment configs, feature flags, rate limits

doc_lookup

Search knowledge base for known issues and solutions

Query

Structured incident

Agentic Loop

Tool calls (avg 4-6)

Output

Severity + Root cause + Fix suggestion

06 — Agent 3

Router Agent

Priority-based routing to the right people, on the right channel, at the right urgency

Severity	Linear Priority	Slack	Email
P1 Critical	Urgent	#incidents @channel	Team lead + on-call
P2 High	High	#incidents @channel	Team lead
P3 Medium	Medium	#incidents	—
P4 Low	Low	—	—

Linear

Slack

Resend

Langfuse

07 — Architecture

System Architecture

React + Vite

Tailwind · TypeScript

5 views

FastAPI

Python 3.12

REST + SSE

3 Agents

Anthropic SDK · Claude 4 Sonnet

Tool-use pattern

Integrations

Linear · Slack · Resend

4 services

PostgreSQL 16

Langfuse Cloud

Knowledge Base

State Machine

received triaging triaged routed resolved

08 — Security

4-Layer Security Model

Defense in depth — every layer independently prevents a class of attacks

L1

Input Validation

• Pydantic models with strict field validation
• File type allowlisting & size limits
• Rate limiting on all endpoints

L2

Prompt Injection Defense

• Input scanning for injection patterns
• System prompt isolation
• Output validation before routing

L3

Tool Allowlisting

• Explicit tool registry — no dynamic tool loading
• Read-only codebase access (no writes)
• Path traversal prevention

L4

Data Protection

• Secrets redaction in logs and traces
• API keys checked at runtime, not hardcoded
• PII-aware structured logging

09 — Observability

Every agent step is a traced span

Full pipeline visibility with Langfuse + structured JSON logging

Langfuse Traces

Per-incident trace with nested spans for each agent, tool call, and LLM invocation

Structured Logging

JSON logs with incident_id, agent, tool, duration, and severity context on every line

Live Metrics

Avg triage time, cost per incident, token usage, severity distribution — all in the dashboard

// Example Langfuse trace structure

intake_agent

12.3s | 2,140 tokens

triage_agent

45.1s | 8,420 tokens

module_search

0.8s

code_reader

1.2s

router_agent

18.7s | 3,890 tokens

10 — Live Demo

See it in action

Live at ssagentx.up.railway.app

Click any screenshot to enlarge

Frontend — 5 Views

Report Form

Triage Results

Incidents List

Metrics

Health Grid

Integration Proof — Real APIs, Real Data

Linear Ticket

Slack Alert

Email (Resend)

Langfuse Traces

11 — By the Numbers

Production-grade from day one

~90s

End-to-end pipeline

Report → Triage → Route

$0.08

Avg cost per incident

Claude Sonnet · $0.04–0.12 range

62

Automated tests

Unit + integration · pytest

3

Specialized agents

Intake · Triage · Router

4

Real integrations

Linear · Slack · Resend · Langfuse

6

Investigation tools

Codebase-aware triage

AgentX

Because your on-call deserves to sleep.

github.com/marylin/softserve-agentx Live Demo

Marylin Alarcon | SoftServe AgentX Hackathon 2026 | #AgentXHackathon