01 / 12

AgentX

SRE Incident Triage in 90 Seconds

3-agent pipeline · Codebase-aware · Fully automated routing

Marylin Alarcon | SoftServe AgentX Hackathon 2026 | #AgentXHackathon

02 — The Problem

Manual triage is killing your SREs

2.5 hrs / day

15–30 min × 5 incidents = wasted engineering time

Reading logs

Sifting through 100s of log lines to find the root cause

Writing tickets

Manually crafting Linear issues with severity, labels, assignment

Pinging Slack

Finding the right channel, tagging the right people at 3 AM

03 — The Solution

Three agents, one pipeline, zero manual steps

Report an incident → get a triaged, routed, and tracked issue in 90 seconds. Fully autonomous.

Intake Agent

Multimodal extraction
Text, images, logs, video

Triage Agent

Codebase analysis
6 investigation tools

Router Agent

Linear, Slack, Email
Priority-based routing

Report → Triaged → Routed in ~90 seconds

04 — Agent 1

Intake Agent

Multimodal extraction — understands text, images, logs, and screen recordings

Text Description

Free-form incident descriptions parsed into structured format

Screenshot Analysis

Claude Vision extracts error messages, stack traces, UI state

Log Parsing

Structured extraction from raw log dumps and stack traces

Screen Recordings

Browser-recorded sessions analyzed for reproduction steps

Input → Output

IN
Raw incident report (any format)

Claude 4 Sonnet

Multimodal processing + structured extraction

OUT
Structured incident object
{ title, description, component,
  error_messages[], affected_systems[],
  reproduction_steps, visual_evidence }

05 — Agent 2

Triage Agent

Codebase-aware analysis — investigates like an SRE, not just an LLM

module_search

Find relevant source files by component or keyword

code_reader

Read source files and understand implementation details

error_patterns

Match error signatures against known pattern database

dependency_graph

Trace module dependencies to assess blast radius

config_analysis

Check environment configs, feature flags, rate limits

doc_lookup

Search knowledge base for known issues and solutions

Query

Structured incident

Agentic Loop

Tool calls (avg 4-6)

Output

Severity + Root cause + Fix suggestion

06 — Agent 3

Router Agent

Priority-based routing to the right people, on the right channel, at the right urgency

Severity Linear Priority Slack Email
P1 Critical Urgent #incidents @channel Team lead + on-call
P2 High High #incidents @channel Team lead
P3 Medium Medium #incidents
P4 Low Low
Linear
Slack
Resend
Langfuse

07 — Architecture

System Architecture

React + Vite

Tailwind · TypeScript

5 views

FastAPI

Python 3.12

REST + SSE

3 Agents

Anthropic SDK · Claude 4 Sonnet

Tool-use pattern

Integrations

Linear · Slack · Resend

4 services

PostgreSQL 16
Langfuse Cloud
Knowledge Base

State Machine

received triaging triaged routed resolved

08 — Security

4-Layer Security Model

Defense in depth — every layer independently prevents a class of attacks

L1

Input Validation

  • Pydantic models with strict field validation
  • File type allowlisting & size limits
  • Rate limiting on all endpoints
L2

Prompt Injection Defense

  • Input scanning for injection patterns
  • System prompt isolation
  • Output validation before routing
L3

Tool Allowlisting

  • Explicit tool registry — no dynamic tool loading
  • Read-only codebase access (no writes)
  • Path traversal prevention
L4

Data Protection

  • Secrets redaction in logs and traces
  • API keys checked at runtime, not hardcoded
  • PII-aware structured logging

09 — Observability

Every agent step is a traced span

Full pipeline visibility with Langfuse + structured JSON logging

Langfuse Traces

Per-incident trace with nested spans for each agent, tool call, and LLM invocation

Structured Logging

JSON logs with incident_id, agent, tool, duration, and severity context on every line

Live Metrics

Avg triage time, cost per incident, token usage, severity distribution — all in the dashboard

// Example Langfuse trace structure

intake_agent
12.3s | 2,140 tokens
triage_agent
45.1s | 8,420 tokens
module_search
0.8s
code_reader
1.2s
router_agent
18.7s | 3,890 tokens

10 — Live Demo

See it in action

Live at ssagentx.up.railway.app

Click any screenshot to enlarge

Frontend — 5 Views

Incident report form

Report Form

Triage results

Triage Results

Incidents list

Incidents List

Metrics dashboard

Metrics

Health dashboard

Health Grid

Integration Proof — Real APIs, Real Data

Linear ticket with full details

Linear Ticket

Slack critical alert

Slack Alert

Email notification

Email (Resend)

Langfuse traces

Langfuse Traces

11 — By the Numbers

Production-grade from day one

~90s

End-to-end pipeline

Report → Triage → Route

$0.08

Avg cost per incident

Claude Sonnet · $0.04–0.12 range

62

Automated tests

Unit + integration · pytest

3

Specialized agents

Intake · Triage · Router

4

Real integrations

Linear · Slack · Resend · Langfuse

6

Investigation tools

Codebase-aware triage

AgentX

Because your on-call deserves to sleep.

Marylin Alarcon | SoftServe AgentX Hackathon 2026 | #AgentXHackathon