Back to Projects

FixFlow

AI-Powered SRE Platform

An AI-powered Site Reliability Engineering (SRE) platform automating incident lifecycle logs, RCA generation, and real-time Socket.io push streams.

FixFlow | AI-Powered SRE Platform

Technologies

ReactExpressMongoDBSocket.ioGemini AIFramer MotionTypeScript

Overview

FixFlow is an intelligent, automated Site Reliability Engineering (SRE) monitoring and alert management platform. It helps engineering teams detect critical outages, isolate root causes, and resolve incidents with minimal manual friction by using automated pipelines.

Architecture & Decisions

Built with a modern web stack: React and Framer Motion on the frontend for high-fidelity animations, and Node.js/Express on the backend. It integrates the Gemini API to orchestrate diagnostic analytics and Socket.io for immediate server-to-client system status alerts. Data persists in MongoDB.

Key Features

Automated zero-touch incident lifecycle detection
Gemini AI root-cause analysis (RCA) postmortems
Real-time Socket.io bi-directional status streams
Scroll-driven glassmorphic APM dashboards

Challenges

Ensuring the real-time Socket.io stream doesn't overload the React state when dozens of microservice alerts fire simultaneously, which was solved by batching state updates.

Lessons Learned

I learned how to construct a resilient multi-stage fallback pipeline (Gemini Pro falling back to LLaMA via Groq) to guarantee RCA script generation during rate limits or provider downtime.