Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Steadwing – Your Autonomous On-Call Engineer (steadwing.com)
10 points by abejith 1 day ago | hide | past | favorite | 2 comments
Hey HN! We’re Abejith and Dev, and we’re building Steadwing (https://www.steadwing.com) - an autonomous on-call engineer that diagnoses production incidents/alerts, correlates evidence across your stack, and resolves them. You can try it at https://app.steadwing.com/signup (no credit card required and a demo mode is available).

Every on-call engineer knows the pain. It’s 2am, PagerDuty fires, you open the laptop and start the scramble - Datadog for metrics, GitHub for recent commits, Slack to see who’s awake, Elasticsearch for logs. 45 minutes later you find it was a config change that reduced the connection pool size. The fix took 2 minutes. The diagnosis took almost an hour.

The problem isn’t fixing things, it’s the correlation. The signal is scattered across a dozen tools and nobody has the full picture. My co-founder, Dev, and I met through Entrepreneurs First and both felt that incident response was fundamentally broken and could be significantly improved, with a long-term vision of making software self-healing.

So we built Steadwing. When an alert fires, it pulls context simultaneously from logs, metrics, traces and recent commits - correlates the signals, and delivers a structured RCA in under 5 minutes with plain-language root cause, evidence linked back to source tools, a timeline, impact assessment, and both short-term and long-term fixes.

For noisy environments: say a bad deploy causes cascading failures across 5 microservices and triggers 30+ alerts. Steadwing groups them into one incident and tells you what the actual root cause is vs. what’s just a side effect. It doesn’t just diagnose - it suggests safe fixes ranked by risk, and can handle rollbacks, scaling adjustments, and config changes for you. You can also ask follow-up questions about any incident or general infra questions conversationally.

All 20+ integrations (Datadog, PagerDuty, Slack, GitHub, Sentry, AWS, K8s, etc.) connect via OAuth or API Key - no agents, no code changes, live in a few seconds. We also built an MCP server so AI coding agents can interact with Steadwing from your dev environment, and we open-sourced OpenAlerts (https://github.com/steadwing/openalerts, https://openalerts.dev) - a monitoring layer for agentic frameworks with real-time alert rules for LLM errors, infra failures, stuck sessions, and queue buildup, with multi-channel notifications via Slack, Discord, and Telegram.

We have a free tier and would love feedback, especially from folks who are on-call regularly.

Let us know what works, what’s missing, and what you’d want next :)

 help



Was using this in our prod microservices, has been helping us with value instantly! Really in for the vision of AI SREs

Truly speaks the real issues faced even in dev, and obviously on prod. I really love how the product simplifies the diagnosis burden :))



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: